Hibernate Traps: Merge versus Update

 Introduction

Welcome to another installment of Hibernate Traps. This time we will take a closer look at one of the most basic and commonly used features - updating Database state with a detached Entity. This time the title is also kinda stretched, because this Trap belongs more to the general JPA realm, than to Hibernate. However because Hibernate is the most commonly used implementation of JPA, and I'm not entirely sure whether all implementations of JPA behave the same way as Hibernate in regard to merging, I will again carry on with the title as it is.


What JPA is not telling You...

Merging a detached Entity is one of the most common operations in any Database-driven application. Any time we get a REST Request that updates data in our application, it is very likely we will use merge operation and expect Hibernate to generate one or more Update SQL statements. This may not be the case if we need to find Entities using some search criteria prior to updating them, because then we will have to first load managed Entities into the EntityManager and then update them. In such scenario Update Statements will be generated by the Dirty Checking logic rather than merging. However in a scenario when we have a well known ID of an Entity we want to update, merging will be the way to go. Or at least that's what many developers think.

Let's take a look at the JPA docs for the merge method:

<T> T merge(T entity)

Merge the state of the given entity into the current persistence context.

Parameters:
    entity - entity instance
Returns:
    the managed instance that the state was merged to

OK so what exactly does it mean to 'merge' something? Well, as it turns out it's more than just an Update. It will:

  • perform Insert or Update SQL Operation, depending on whether the detached Entity passed to the merge method has ID field(s) populated
  • in case of Update SQL Operation, Hibernate will execute an additional Select Statement for each merged Entity

Although the first of the above actions is kinda expected, the second one can have very serious performance implications in a form of...

 

N+1 Selects, baby!

Well, it's actually N Selects + 1 Update. But let's take it one step at a time and check out some Code Examples. Let's say we want to update a state of a single detached Entity:

//WHEN: User is being updated using EntityManager::merge
jpaTransactionWrapper.doInTransaction(em -> {
   Person person = new Person("John Doe 0");
   person.setName("John Merge 0");
   person.setId(0L);
   em.merge(person);
});

Our naive expectations will be that Hibernate issues a simple Update SQL Statement to the Database. It's not that we need anything else, right? We have an ID of the Entity we want to update and we know what data we want to update. The unexpected reality is however that Hibernate will generate 2 SQL Statements:

//THEN: 1 Select + 1 Update are executed against Database
SQLStatementLoggingAppender.printQueriesContaining("execute");
assertEquals(2, SQLStatementLoggingAppender.countQueriesContaining("execute"));
There are 2 SQL Statement(s) collected by the Logger, that contain 'execute':
java.sql.PreparedStatement.executeQuery: select person0_.PERSON_ID as person_i1_0_0_, person0_.NAME as name2_0_0_ from PERSONS person0_ where person0_.PERSON_ID=0;
java.sql.PreparedStatement.executeUpdate: update PERSONS set NAME='John Merge 0' where PERSON_ID=0;

How do I know that? I use a DataSource proxy to log all SQL Statements generated by Hibernate:

<dependency>
    <groupId>org.jdbcdslog</groupId>
    <artifactId>jdbcdslogexp2</artifactId>
    <version>2.1</version>
    <scope>test</scope>
</dependency>
<property name="javax.persistence.jdbc.url" value="jdbc:jdbcdslog:h2:mem:test?targetDriver=org.h2.Driver"/>

OK so we now know that merge operation consists of Select+Update pair of SQL Statements. But how do we go from here to N+1 problem? The answer to that question is

 

JDBC Batching

Imagine that for some reason we have not 1 detached Entity to merge, but a 1000 of them. It would be beneficial from performance point of view to enable JDBC Batching in Hibernate:

<property name="hibernate.jdbc.batch_size" value="1000"/>
<property name="hibernate.order_inserts" value="true"/>
<property name="hibernate.order_updates" value="true"/>
<property name="hibernate.batch_versioned_data" value="true"/>

But what will happen when we try to use merge operation on a 1000 detached Entities?:

//WHEN: 1000 Users are updated using EntityManager::merge
jpaTransactionWrapper.doInTransaction(em -> {
   for (long i = 0; i < 1000; i++) {
      Person person = new Person("John Merge " + i);
      person.setId(i);
      em.merge(person);
   }
});

Yup, you guessed it. 1001 SQL Statements:

//THEN: 1000 Selects + 1 Update are executed against Database
SQLStatementLoggingAppender.printQueriesContaining("execute");
assertEquals(1001, SQLStatementLoggingAppender.countQueriesContaining("execute"));
There are 1001 SQL Statement(s) collected by the Logger, that contain 'execute':
java.sql.PreparedStatement.executeQuery: select person0_.PERSON_ID as person_i1_0_0_, person0_.NAME as name2_0_0_ from PERSONS person0_ where person0_.PERSON_ID=0;
java.sql.PreparedStatement.executeQuery: select person0_.PERSON_ID as person_i1_0_0_, person0_.NAME as name2_0_0_ from PERSONS person0_ where person0_.PERSON_ID=1;
...
java.sql.PreparedStatement.executeQuery: select person0_.PERSON_ID as person_i1_0_0_, person0_.NAME as name2_0_0_ from PERSONS person0_ where person0_.PERSON_ID=999;
java.sql.Statement.executeBatch: 

This for most Hibernate users is entirely unexpected and unacceptable behavior, especially for applications that perform batch data processing, or handle large numbers of Requests.

OK, so how do we solve it? Fortunately JPA creators have enabled us to use implementation-specific features in a rather clean way, by using the unwrap method:

Session hibernateSession = entityManager.unwrap(Session.class);

Having direct access to Hibernate's Session instance enables us to use features not available in JPA Specification. One of those is update method:

//WHEN: 1000 Users are updated using Session:update
jpaTransactionWrapper.doInTransaction(em -> {
    for (long i = 0; i < 1000; i++) {
       Person person = new Person("John Update " + i);
       person.setId(i);
       em.unwrap(Session.class).update(person);
    }
});

Changing the operation from JPA merge to Hibernate update drastically changes the behavior of our code:

//THEN: only one Update Statement is executed against Database
SQLStatementLoggingAppender.printQueriesContaining("execute");
assertEquals(1, SQLStatementLoggingAppender.countQueriesContaining("execute"));
There are 1 SQL Statement(s) collected by the Logger, that contain 'execute':
java.sql.Statement.executeBatch: 

 

Conclusion

Once again it was proven, that in order to efficiently use Hibernate, we need to know how it works internally. Seemingly obvious and self-explanatory Java methods can hide unexpected and undesirable behavior that will most likely be uncovered too late in the development process, forcing us to quickly diagnose and fix Production slowness issues. Better to come prepared and to become a Hibernate expert before that happens.

Comments