Hibernate Traps: A Leaky Abstraction

Introduction

Welcome to my new series of articles that will take on a various traps and pitfalls we can encounter when using Hibernate. This is not another rant about why Hibernate is bad and why you shouldn't use it. Like it or not, Hibernate is one of the most widely used technologies in the Java world and sooner or later every Java developer will have to use it in some way.

Nevertheless it is a huge and complicated Framework that, in order to be as generic as possible, has a lot of behaviors ranging from not so obvious to downright weird and quirky. It's better to be prepared and to know what we're dealing with, and how it works underneath. That's why my first article in the series is entitled...

A Leaky Abstraction

Hibernate's main selling point for majority of Java developers is that it takes care of the SQL side of things so we don't have to get our hands dirty with writing SQL Statements, we can just use pretty Java objects instead. The funny thing is that one of the main contributors to Hibernate - Vlad Mihalcea is a great fan of SQL and always tries to emphasize that Hibernate is not a drop-in, one-to-one replacement for using SQL.

The reality is that in order to properly use Hibernate and not fall victim to either performance issues or unexpected behaviors, we need to not only know how SQL works, but also how the Hibernate itself works underneath. That's why in my book, Hibernate is one of the most leaky abstractions in the Java world.

According to the Wikipedia: "In software development, a leaky abstraction is an abstraction that leaks details that it is supposed to abstract away." That's exactly the case with Hibernate. Whether you want to use it as a drop-in replacement of SQL or you want to mix it with Native Queries, its main shtick is that you can use Java code to perform all (or at least most) Persistence operations that will be then translated to SQL and executed against your Database.

This assumption alone has a great implication - in order for the abstraction to actually serve its purpose, we need to be sure that whatever is logical from Java point of view, should also work correctly (and by intuitively) in Hibernate. That's sadly not the case. A code that looks perfectly fine can cause unforeseeable side effects and even cause Runtime Exceptions that will potentially be hard to catch and/or debug, even by automated tests.

To prove my point, I've created a GitHub project containing all the code from this series in an executable form. An example depicting and proving my point of Hibernate being a Leaky Abstraction can be found here. Feel free to clone the project, execute it and fiddle around with it. Going forward I will reference the code from that example.

Instruction call order, Dirty Checking and Auto-flush conundrum

In order for the Hibernate to deliver the most automatic and 'magical' experience, it uses two techniques to translate the Java code execution flow into a series of SQL statements:

Dirty Checking

Whenever you call EntityManager::persist() method or fetch an Entity instance using EntityManager, you get what's called a Managed Object. It is a Proxy of the real Object that you want to operate on. It's main job is to track all changes to the properties (field values) of that instance, so that they can be transparently persisted to the Database.

Auto-flushing

Whenever you want to execute an SQL Query (for example JPQL or Criteria Query), Hibernate (by default) will iterate over all Managed Objects it has cached in the current Session and see if any of the Objects is 'dirty' - meaning that it's properties have been altered since the time it was loaded or updated. If so, prior to executing the requested Query, Hibernate will flush all the changes of all the Managed Objects into the Database. This will ensure, that the requested Query will be executed against the most up-to-date data available in the scope of the current Transaction.

In many cases these mechanisms work like a charm. However it is perfectly possible to write a piece of code, that will look fine from Java point of view, but will trigger a PropertyValueException or even an SQLException, depending on whether our Entity Mappings are completely in sync with our Database Schema.

The Example

OK, so let's go over my example, which is all about adding a new Item to the Shopping Cart. Let's see the code that works as intended:

    public void addProductToCart_correctInstructionOrder(Long cartId, String productName, Long quantity) {
        //fetch a Managed instance of ShoppingCart
        ShoppingCart shoppingCart = getCart(cartId);
        //create new CartItem intance
        CartItem ci = new CartItem();
        //populate CartItem instance with data
        ci.setProduct(getProductByName(productName));
        ci.setQuantity(quantity);
        //add Item to the Cart
        shoppingCart.addItem(ci);
    }

The flow is simple and self-explanatory. But what if we change the order of some method calls around? After all Java is an Object Oriented Language so this shouldn't matter, right? Let's try to add the Item before populating it:

    public void addProductToCart_correctInstructionOrder(Long cartId, String productName, Long quantity) {
        //fetch a Managed instance of ShoppingCart
        ShoppingCart shoppingCart = getCart(cartId);
        //create new CartItem intance
        CartItem ci = new CartItem();
        //add Item to the Cart
        shoppingCart.addItem(ci);
        //populate CartItem instance with data
        ci.setProduct(getProductByName(productName));
        ci.setQuantity(quantity);
    }

Looking at the definition of the 'addItem' method, everything should work:

    public void addItem(CartItem item) {
        items.add(item);
        item.setShoppingCart(this);
    }

The 'items' property is just a plain old ArrayList after all, right?

    @OneToMany(mappedBy = "shoppingCart", cascade = CascadeType.ALL)
    private List<CartItem> items = new ArrayList<>();

In reality, it's not as simple as it looks. Although you see an ArrayList in the code, once you fetch a Managed Instance from Hibernate, this List will become a PersistentBag instance - one of Hibernate's Persistent Collections. Upon calling the 'addItem' method, it will mark the ShoppingCart object as 'dirty' - meaning that its state has to be flushed to Database before any Query can be performed.

I don't know if you've noticed, but part of the Cart Item properties population is calling a method 'getProductByName'. It fetches the Product instance by the Product name:

    private Product getProductByName(String productName) {
        return em.createQuery("select p from Product p where p.name = :name", Product.class)
                .setParameter("name", productName)
                .getSingleResult();

    }

So guess what... If we try to add the Item to the Cart before populating it, Hibernate will try to persist it into the Database in order to execute the Query used for Item population. What happens now? It depends on your Database Schema.

If you have any kind of sane Schema configuration, you have set all the columns of the Cart Item Table to not accept null values. In such case, depending on whether your Entity mappings reflect that by having the 'nullable' flag set to 'false', you will get either:

PropertyValueException - because Hibernate will detect trying to persist null value into a non-nullable column, prior to sending SQL Statement to the Database
SQLException - because Database will refuse to execute the SQL Statement issued by Hibernate on the account of a Constraint Violation

This example is very simplistic and the fix will be easy. In reality though, when dealing with a very large code bases, detecting the root cause of such errors may not be so easy. The fact that the code looks perfectly fine on the first sight makes it very easy to introduce unwanted, accidental regressions simply by changing order of some method calls. Tracking all the places where we think that Hibernate will perform the flush can be borderline impossible.

This also may not be so easy to catch using the automated tests, because many developers use Transactional Integration Tests - another trap that I will describe in my next article.

The Solution(s)

OK, so how can we avoid such situations? It depends on how much 'magic' are we prepared to loose in order to regain control of our application. I will list the possible solutions from the least to the most invasive methods:

Add business validation logic to your Entities. Forget about having the 'thin Entity layer' that only contains getters and setters. In our example, putting some non-null validations in the 'addItem' method would make all the difference in the world - it would be readable and obvious that adding an empty Item to the Cart will trigger a meaningful Exception. Ideally we would be consciously adding an automated Test Case that covers this scenario, making it a part of the specification.
Try to avoid mixing read and write SQL operations. In the above example everything would work correctly, if we would fetch the data needed to populate the CartItem prior to actually populating it. Mixing Selects and Insert/Update/Delete statements also tend to create long Transactions, that have a negative effect on your overall Database performance due to possible Lock Contention. Use Optimistic Locking whenever possible to maintain the data consistency without having to explicitly lock resources in the Database. For pessimistic locking scenarios, think about using an external Distributed Lock, for example the one available in Spring Integration. This will nicely decouple you concurrency control logic from your persistence logic and ensure that you lock on logical Domain Objects, rather than on Database resources.
Switch Hibernate's flush mode from AUTO to COMMIT. This may hurt a little, especially for preexisting code. Such change will automatically perform flushes only at the end of given Transaction. Any mid-Transaction flushes will have to be manually called via EntityManager::flush(). It will make your code a little more ugly and verbose, but you will regain control over the time and place of executing the SQL statements. This is a great power that shouldn't be handed over to Hibernate lightly as it can influence the overall performance of your application. Auto-flushing turns the code into kind of a 'hostage' of Hibernate in the sense that once we write our application using the auto-flushing feature, it will be very hard to back out from using it.
Replace Hibernate with jOOQ or QueryDSL. This is my favored option as I personally prefer to maintain total control of all my SQL Statements. I did my share of Hibernate - based development and I know that everything you can possibly do with Database, you can do is using Hibernate. I just think that there are now more modern and cleaner ways to do it. Back in the day when the only alternative to Hibernate was JDBC API, I was gladly using it because let's face it, JDBC API is horrible. Nowadays we have many options to choose from and each one of us should consider if benefits of using Hibernate outweigh the need of having a deep knowledge on how it works and verifying if it doesn't destroy our application's performance or cause weird behaviors every step of the way.

Conclusion

I am far from hating Hibernate. It's a tool with a huge user base and it has contributed to the Java Enterprise landscape in many great ways. Nevertheless it's a huge, leaky abstraction and it's very hard to utilize its power correctly. I personally choose not to use it anymore and I feel kind of liberated from an entire class of issues. Having that said, it is perfectly possible to develop a high performing, enterprise grade application using Hibernate as the persistence provider. You just need to follow some rules:

You have to know how it works underneath.
You can't treat it as a black box that will magically turn your Java code into SQL Statements.
You have to verify what SQL Statements are executed by it against your Database. That includes every refactoring, even one that theoretically has nothing to do with Persistence layer of your application.
Don't replace all of your Native SQL Queries with JPQL and/or Criteria API. Use Window Functions, Unions, CTEs and all other native features that offer superior performance.

Search This Blog