ORM Tips: Separating Business Logic and Data Logic

Introduction

Welcome to another one of my database-centric articles. This time, although all examples will be written in Java, the topic at hand is so generic and universal, it can be applied to any programming language that utilizes any Database Abstraction library that can be called an ORM. Be it Java with Hibernate, PHP with its Doctrine or Ruby with Active Record.

Modern Application Architecture has drifted away from layered Monoliths to more flexible and manageable micro-services that utilize Hexagonal approach. Even if you didn't jump into the micro-services train, the focus these days tends to be on good modularization, vertical slicing and defining proper Bounded Contexts and Aggregate Roots within the Business Domain of the developed Product. But does it mean, that a Service or a Module can just be a bag of poorly structured, interconnected Classes with no proper, repeatable and meaningful naming, packaging and responsibilities? Can we, even when using DDD, CQRS and/or Event Sourcing, forget about inner Layers that, when properly defined, allow easy code navigation and ensure its resilience for future changes? I'd argue, that such approach would result in a poor and short-sighted Architecture.

Layer Cake

The traditional approach would have us divide the code into:

API Layer
Service Layer
Repository Layer

The importance of a well-defined API cannot be overstated, but in this article I will focus on the two lower Layers - the ones responsible for Business and Data Logic. Why? Because introducing an ORM to any Application code base can seriously damage its Architecture, leaving it open to two very dangerous phenomena:

unwanted side effects of executing the code
inherent lack of code flexibility

But before I'll move to explaining things, let's set, as always, an

Example

Below you can see an ERD - Entity Relationship Diagram. It depicts a fragment of a Database structure that could belong to a simple E-Commerce Solution.

Based on it, let's try to come up with a simple Business Code that could potentially be responsible for fetching Data with Details of a placed Order:

public class OrderService {

    private final OrderRepository repository;

    public OrderService(OrderRepository repository) {
        this.repository = repository;
    }

    public OrderDetailsDto getOrderDetails(Long orderId) {
        Order order = repository.getOrderById(orderId);
        OrderDetailsDto details = new OrderDetailsDto(order.getId(), order.getStatus());
        details.setItemDetails(getItemDetails(order));
        details.setPaymentDetails(getPaymentDetails(order));
        details.setShipmentDetails(getShipmentDetails(order));
        details.setTotal(getTotal(order));
        return details;
    }

    private List<OrderItemDetailDto> getItemDetails(Order order) {
        return order.getItems().stream()
                .map(item -> {
                    Product product = item.getProduct();
                    Category category = product.getCategory();
                    return new OrderItemDetailDto(
                            product.getId(),
                            product.getName(),
                            category.getName(),
                            item.getQuantity(),
                            item.getUnitPrice()
                    );
                }).collect(Collectors.toList());
    }

    private OrderPaymentDetailDto getPaymentDetails(Order order) {
        Payment payment = order.getPayment();
        PaymentMethod method = payment.getMethod();
        OrderPaymentDetailDto details = new OrderPaymentDetailDto(
                payment.getStatus(), method.getName()
        );
        details.setHistory(order.getPaymentHistory().stream()
                .map(ph -> new OrderPaymentHistoryDto(ph.getStatus(), ph.getCreatedAt()))
                .collect(Collectors.toList()));
        return details;
    }

    private OrderShipmentDetailDto getShipmentDetails(Order order) {
        Shipment shipment = order.getShipment();
        ShipmentMethod method = shipment.getMethod();
        Address address = shipment.getAddress();
        OrderShipmentDetailDto details = new OrderShipmentDetailDto(shipment.getStatus(), method.getName(),
                address.getAddress(), address.getCity(), address.getPostCode(), address.getCountry());
        details.setHistory(order.getShipmentHistory().stream()
                .map(sh -> new OrderShipmentHistoryDto(sh.getStatus(), sh.getCreatedAt()))
                .collect(Collectors.toList()));
        return details;
    }

    private BigDecimal getTotal(Order order) {
        return order.getItems().stream()
                .map(item -> item.getUnitPrice()
                        .multiply(BigDecimal.valueOf(item.getQuantity()))
                )
                .reduce(BigDecimal::add)
                .orElse(BigDecimal.ZERO);
    }
}

It may not be very sophisticated, but it gets the job done and - more importantly - depicts the usual way we'd deal with Data fetched from the Database in a form of the Managed Entities with all of their defined Relations. So what exactly is wrong with that code? We are, after all, using Repository Layer to fetch the necessary Data to be then used by the Service Layer, right? OK, let's start with

Side Effects

The logic in the above example reads like a plain, regular Java code. Its flow is easy to follow, we know what will be the result of its execution. The methods are small and correctly named. So, what's wrong with it?

It can potentially execute multiple SQL Queries and we cannot definitely tell which lines will do that and how many Queries there will be. Not from reading that piece of code alone at least. The fact, that our Repository Layer returns a Managed Entity seriously distorts the responsibilities of both Layers. Each time you call an Entity Method that returns one or more Related Entities, there is a chance that the Application will have to execute an SQL Statement in order to complete the execution of the Business Logic.

Why is that a Side Effect? Because it is not explicitly apparent from reading the code, that such action may occur. The reader may even not be (and more importantly - shouldn't have to be) aware that you're dealing with the Entities here.

What kind of Side Effect is it? I think it falls under Action at a Distance. It occurs, when "behavior in one part of a program varies wildly based on difficult or impossible to identify operations in another part of the program". That is pretty much the case here. Although we're not modifying any distant state, the behavior of our Business Code will vary, depending on the (very often difficult to embrace and possibly scattered) configuration of multiple Entities that are defined in a completely separate part of the code base.

Why is it that bad? We have to remember that querying a Database is in fact an integration with an external System. Would you like for your code to perform Network Lookups just because you've called an equals() method? Surely not. Our example of Entities usage is the same level of bad. Their behavior can be:

unexpected - while reading the code you don't get any clues that such action may be performed.
confusing - you're forced to cross-reference the code with other, possibly distant code - most likely the ORM configuration.
dependent on external factors - which lines of code will execute SQL and how many Queries will be executed will depend on the configuration on your ORM, and that, in the worst-case scenario, can even be dynamic and Runtime dependent.

OK, so can't we fix that with applying something like JPA Entity Graph? No. It's one thing that not every ORM may have such solution available in the first place but, first and foremost, Service Layer should be absolutely oblivious to where is the Data coming from and how are we retrieving it. And that brings us to the even more dreaded issue with Entity misuse.

Lack of Flexibility

Our Application will change over time. There may be new Requirements along its life cycle or we may have to increase its scalability and/or performance. It's very important to understand that, although implementing and/or fixing Business Logic is usually not that hard (at least from a purely technical point of view), the same cannot be told about making the application scalable and/or ensuring its performance. Why? Because, from my experience, almost 9 out of 10 cases of poor performance and/or scalability issues are connected to poorly written Data Access code.

Let's take a look at our code example again. What if, along the road, we decide to separate the Data Store that keeps our Orders from the one that keeps our Product Catalog? We may move our Products to a different Relational Database or even choose to move them to a noSQL Store. Or what if the Database Structure will, over time, become so complex that we will have to resort to using a Materialized View or, even better, a Stored Procedure?

If our Business Logic is closely coupled with our Data Logic, it will have to be changed every time we want to change or upgrade our Data Access code. That will require much more work, may potentially introduce many regressions, and will surely consume more time and money.

Even worse, such situation will make searching for a potential performance issue very painful: "Software bugs due to action at a distance may arise because a program component is doing something at the wrong time, or affecting something it should not. It is very difficult, however, to track down which component is responsible. Side effects from innocent actions can put the program in an unknown state (...)"

OK, so how could we fix the code from the example to not yield any side effects and be more flexible? It's easy, but it will require us to produce some code that many will instantly resent for being a boiler plate. We need to introduce a Contract between our Business code and our Data Access code. That Contract has to define DTOs that will:

Go into and out of the Data Layer,
Contain Data Structures needed to fulfill a concrete Business need,
Be explicitly and informatively named,
Ensure, that all Data needed by the Service Layer was already loaded by the Data Layer.

It may look something like this:

public class OrderService {

    private final OrderRepository orderRepository;

    private final ProductRepository productRepository;

    private final ShipmentRepository shipmentRepository;

    private final PaymentRepository paymentRepository;

    public OrderService(OrderRepository orderRepository, ProductRepository productRepository,
                         ShipmentRepository shipmentRepository, PaymentRepository paymentRepository) {
        this.orderRepository = orderRepository;
        this.productRepository = productRepository;
        this.shipmentRepository = shipmentRepository;
        this.paymentRepository = paymentRepository;
    }

    public OrderDetailsDto getOrderDetails(Long orderId) {
        BaseOrderDataDto order = orderRepository.getBaseData(orderId);
        List<OrderItemDto> orderItems = orderRepository.getItems(orderId);
        OrderDetailsDto details = new OrderDetailsDto(order.getId(), order.getStatus());
        details.setItemDetails(getItemDetails(orderId, orderItems));
        details.setPaymentDetails(getPaymentDetails(orderId));
        details.setShipmentDetails(getShipmentDetails(orderId));
        details.setTotal(getTotal(orderItems));
        return details;
    }

    private List<OrderItemDetailDto> getItemDetails(Long orderId, List<OrderItemDto> orderItems) {
        List<OrderedProductDto> orderedProducts = productRepository.getOrderedProducts(orderId);
        Map<Long, OrderedProductDto> orderedProductsMap = orderedProducts.stream()
                .collect(Collectors.toMap(OrderedProductDto::getProductId, p -> p));
        return orderItems.stream()
                .map(item -> {
                    OrderedProductDto product = orderedProductsMap.get(item.getProductId());
                    return new OrderItemDetailDto(
                            item.getProductId(),
                            product.getProductName(),
                            product.getCategoryName(),
                            item.getQuantity(),
                            item.getUnitPrice()
                    );
                }).collect(Collectors.toList());
    }

    private OrderPaymentDetailDto getPaymentDetails(Long orderId) {
        OrderPaymentDto payment = paymentRepository.getPayment(orderId);
        List<OrderPaymentHistoryDto> paymentHistory = paymentRepository.getPaymentHistory(orderId);
        OrderPaymentDetailDto details = new OrderPaymentDetailDto(
                payment.getStatus(), payment.getMethodName()
        );
        details.setHistory(paymentHistory);
        return details;
    }

    private OrderShipmentDetailDto getShipmentDetails(Long orderId) {
        OrderShipmentDto shipment = shipmentRepository.getShipment(orderId);
        List<OrderShipmentHistoryDto> shipmentHistory = shipmentRepository.getShipmentHistory(orderId);
        OrderShipmentDetailDto details = new OrderShipmentDetailDto(shipment.getStatus(), shipment.getMethodName(),
                shipment.getAddress(), shipment.getCity(), shipment.getPostCode(), shipment.getCountry());
        details.setHistory(shipmentHistory);
        return details;
    }

    private BigDecimal getTotal(List<OrderItemDto> orderItems) {
        return orderItems.stream()
                .map(item -> item.getUnitPrice()
                        .multiply(BigDecimal.valueOf(item.getQuantity()))
                )
                .reduce(BigDecimal::add)
                .orElse(BigDecimal.ZERO);
    }
}

OK, so what is so better about the second version?

While reading the code you'll get visible clues about the places where the Data is actually fetched (every time a Repository is used).
Execution of the fixed code won't cause any Side Effects.
Business Logic is now completely decoupled from Data Logic, leaving us the freedom to re-implement the Data retrieval code without having to modify the Service Layer *. You can still use your Entities in the implementation of the Data Access Layer. Or not. You can also call Native SQL, use jOOQ or even call a noSQL Store.

The code may look more bloated and explicit now, and will require some additional Classes to be present in the code base, but I think it is a small price to pay for a predictable and flexible solution that has a lot smaller potential to cause any headaches in the long run. When it comes to software development, being explicit is almost always better than being smart.

Conclusion

I understand if you will resent my point of view as being too strict, impractical and academic. I know that it's very convenient to just fetch an Entity and be done with it, focusing on implementing and testing Business Requirements. However I can guarantee that, if you take my approach, it will pay off big time in the long run. Why? Because sooner or later you'll end up with a piece of code that performs poorly because of incorrectly implemented Data Access aspect, and you will have a hard time fixing it for all the reasons I've described in this article.

All of the above is applicable not only to a Service-based approach. Maybe your Business Logic resides in Aggregate Roots or maybe it is handled by a Command Subscriber that fires an Event. Whatever approach you're using, make sure that you're Data Access logic is strongly decoupled from the higher Layers and is not causing any Side Effects.

Hope you'll find it helpful. Thanks for reading!

* Will that always be the case? Perhaps not. But even if you'll have to make some changes to the Service Layer, they will be orders of magnitude smaller than if your Data Access code would be entangled with your Business code.

Search This Blog