Scope Management in Java Architecture
This is an english translation of my original article that was published on a polish tech blog of JustJoinIT.
No Man's Land
Scope has many different meanings in programming, depending on the context. Let's focus on understanding this term in the context of object-oriented languages. In this case, it is one of the basic mechanisms that guarantees encapsulation. We use it every time we create a new class, a method, or a field.
Are there any difficulties in managing scope? We deal well with this at the micro-scale – we know when the element is to be private, when it is public, and when it is protected. For fields, methods, and internal classes (nested class) this is most often due to our current needs. Public is a public API. Protected, (possibly package-private – although less common) is useful when you want to use polymorphy and/or get into the implementation of the base class from derived classes. Private means that we do not want to share with anyone the internal logic of a given functionality. For traditional classes, we usually stay with what our IDE generates, which is public.
At macro scale, too, we don't get into much trouble. In the case of monolithic applications, we throw everything into one jar and try not to lose our minds using strict naming and packaging conventions. In the case of micro -services we can use such proven methodologies as, for example, defined Bounded Contexts (DDD) and division in the style of 1 service = 1 BC. Some also use a rather extreme (for my standards) approach, where almost every feature receives its own service. This does not change the fact that there are practices and approaches in this area that we can emulate and adapt to our needs. Their rules are quite clearly defined, and the boundaries visible. Any violations are clearly visible, so we can catch them during Code Review.
Is there any other scale? Something indirect between micro and macro? Yes, there is. After all, it’s in this in-between scale that we usually have the biggest problems. It is a real "no man's land". There is virtually no formalized set of good practices here. Everyone is a sheriff here and everyone is trying to work out their own laws and principles.
Is it possible to introduce any general, universal set of rules to this scale? In my career as a programmer and systems architect, I have been fascinated with different methodologies, patterns and frameworks. Many times, have I tried to create a set of principles or a framework that would answer the above question. I have tried to apply well-described practices that have been described in programming books, such as DDD, CQRS and Event Sourcing.
Throughout the years I have come to the following conclusions:
- The simplest solutions are the best
- Java itself has many built-in solutions that, when properly applied, can help us solve the above-outlined problems
A Bit of Privacy
When we create a new Java class, what is its default scope? If (like the vast majority of developers) we use our IDE to do it, the class will probably be public. We are accustomed to convention 1 *.java file = 1 public class. Everyone does it like that. But what is the actual default scope of classes in Java? A package scope of course. Were the creators of Java trying to tell us something by that? Oracle advises in its official: "Use the most restrictive access level that makes sense for a particular member. Use private unless you have a good reason not to." Every developer knows to use rigorous access restrictions, and that includes the package scope access in the context of our Java classes.
Okay, but how, in practice, can I use package access in my code? After all, that would mean that I have to keep all my classes in one package, and I like to have an extensive package hierarchy. For the longest time I challenged myself with that exact question. Almost every class in my code landed in a separate, appropriately named package. I had a sense of order and a predetermined convention at the time. At some point, however, I asked myself – what is more important? Should I adjust the visibility of the components to my package philosophy, or the opposite – should I adjust the hierarchy of classes and packages to maintain the most restrictive access? It turned out that it was possible to make it work and to get a coherent, universal and readable methodology. However, before moving on to the examples, I would like to raise a problem which is, in a sense, the second pillar supporting the solutions I have proposed.
What is an API? It is a set of operations that is publicly available to clients of an item. This article deals with intermediate scale, so let's put aside both the class-level API and an API of an entire application for another time and focus on a Module API. The concept of "module" fits perfectly into this topic, as it is the least formalized unit of hierarchy in Java architecture.
Although Java 9 introduced modules to our world through the Jigsaw project, they do not solve the problems I have outlined. Jigsaw is primarily used to modernize JDK itself, and its applications can be appreciated primarily by the creators of libraries and/or a reusable code, with many customers/appliances. Classic business applications are usually not able to harness the power of modules to improve the quality of their architecture.
What could our module be then? In my world, a module is a group of operations that I perform on a given set of objects or data (domain model). A classic example of this is managing users in the system: We can create, edit, delete, download a list of users, retrieve the details of a given user, assign a role to the user in the system or pick it up. So, let's create an empty API to start with, which will serve us to manage users:
Let's also create a barebone test that proves our code works as required:
The first functionality we would probably like to have in our API is the ability to create a new user:
Of course, our test will imminently fail, as we do not yet have an implementation:
So, let's write it. Our API class will serve us as a Façade at this point. It is one of the oldest and somewhat forgotten design patterns. Despite its age, however, it is great for our purposes. It will delegate the actual work to classes that contain implementation:
So, let's see what our UserCreator looks like:
Have you noticed that both the UserCreator class and the create() method have package scope? This is an implementation of our module and as such should not be visible anywhere except to the module itself. This logic, despite being placed in a separate class (and file), is placed in the same package as the API. Thanks to this, our module has only one access point. Any operations we want to perform on users have to go through our public UsersApi. Over time, the number of classes providing implementation will significantly increase. There is absolutely nothing wrong with that. Each appropriately named implementation class will perform one function (SRP) and provide implementation to the API and its users. One module (which, as we have already established, is a logical unit containing a set of related operations) should boil down to one public class with a public API, which in turn will be the only entry point to the business logic contained in the module. In more complex cases, this may look like this:
But there are mixed classes for different purposes – repositories, validators, etc... That's true. Is there really anything wrong with this (maybe except that it’s violating our habits)? All the business logic of a given module is in one place, and class naming conventions allow you to easily navigate such code.
What if there’s a lot of code in the module? In such case, one module can always be divided into several smaller ones, or sub-modules can be extracted (that communicate with each other through their public API classes):
The great added value of such a solution is the order in the public namespace. If you want to refer to any module, the IDE will consistently ignore classes with package access, showing only public APIs as available.
Of course, the actual implementation will surely be more complex. For example, you can move the implementation of user creation to a dedicated repository:
And prepare two implementations – production (integration tested, located in the production code) and test (unit tested, located in the test code):
When we instantiate the UsersApi class for production, you will use a production implementation:
When you run it for testing, we'll use a test implementation:
The example above is simplified for the purposes of this article. However, it presents a very versatile and practical pattern that can be used in much more complex – and not necessarily CRUD – based applications. I myself use it passionately in my applications and the results are very satisfactory.
One of the determinants of a good architecture is its repeatability. Ideally, if one is looking at a large app created by an entire development team, he should get the impression that the app was written by one person. Learning how to navigate one module should allow you to navigate the entire application, regardless of its size.
You may have noticed that in the example above, we are not testing the UserCreator class directly. Unit testing is the next piece of the puzzle that fits very well into the entire scope management architecture.
Consider the definition of "unit" in the "unit tests" expression. There are different types of automated tests – unit, integration, functional, acceptance and many others. One of the main differences between them is the scope of the code being tested. Most developers equate unit tests with lowest-level tests that check individual classes. In such case, Java class is our “unit”. But is this a good approach?
To answer this question, let's consider why a developer would use automated tests. I can think of two main objectives:
1. Prove that the application has been implemented according to business requirements.
The application is never written in a vacuum. There is always some business context and there is (or at least should be) a person defining the requirements. The tests confirm that the requirements have been met and the application behaves correctly in different scenarios, especially the edge ones.
2. Protect your application from regression due to continuous development or bug fixes.
The tests give us a sense of security and courage in making code changes. We are sure that if we make any accidental, unintended changes, our tests will capture this and will not allow the introduction of a defective product into production.
Does the mere possession of automated tests and a high level of code coverage guarantee us that we can feel safe? Absolutely not! What’s more, if most of our tests are class-level unit tests, they are unlikely to sufficiently protect us against regression. Why?
As we said, our code should reflect the business requirements. And so, is there any concept of a 'class' (in a purely programming sense) in any of the businesses you know of? Probably not. Class is something foreign to business. It is our internal organizational unit that allows you to divide the problem into smaller parts. It gives us encapsulation, abstraction and improves the reusability of the code. However, it has nothing to do with business requirements. It is an implementation detail. And implementation, as we know, can change. Business requirements therefore naturally have a scope greater than individual classes.
In addition, we can define the following dynamic between implementation and business requirements:
Changing business requirements entails a change in implementation.
A change in implementation does not necessarily have to be caused by a change in business requirements.
It can be the result of refactoring, changing approaches, fixing bugs, improving performance, or updating dependencies (e.g. external libraries). So, if our automated tests are at the class level, any implementation changes will require us to change, or even reimplement, the test suite. Tests that are somewhat "glued" to specific classes do not focus on testing business requirements, but on their implementation. We may have a bizarre situation when, despite the lack of a change in business requirements, the desire to re-implement a piece of code will entail the need to reimplement hundreds or even thousands of tests. In such a situation, there is no protection against regression. The code is "immobilized" with tests, and the only thing that tests verify is the current implementation. In order to counteract such situations and to genuinely protect against regression, tests (and in particular asserts – places where we verify the validity of tests) should be changed only if the underlying business requirements are altered. However, in order to achieve this, such tests (yes, even unit tests!) should have adequate scope. Scope greater than a single class. I would argue that the appropriate scope for any unit test is a Module scope.
But wait a minute! There is something called integration testing! Shouldn't the integration tests be responsible for testing integration between classes? In theory, yes. Though in reality, integration tests have one very serious drawback: they are slow. A typical integration test requires us to bring up the spring context, create an in-memory database, interact with many I/O-heavy things like JMS queues. In an ideal world, if we were not limited by the slowness of integration tests, our test code should consist only of them. Unfortunately, going down this path, although tempting, will become more and more cumbersome as the application (and the test suite) grows in time. In extreme cases, there may be a situation where the application build will take 40 minutes (or even more) and the full test suite will be executed only on the CI server.
Module-level unit tests are the best possible compromise. They behave almost like integration tests, and work at incomparably higher speeds. Why almost like integration tests? The compromise here would be to give up using I/O wherever possible. In this particular case, the actual (even in-memory) database must be replaced by collections. Look again at our sample code. This is how you can replace a real database for testing purposes. Take another look at our example - the test we wrote is at the API level. We can freely change the implementation underneath and the test will pass as long as our business requirements are met.
This approach has another added value. It reassures us that our logic is not based on the database - ensured data integrity. Database constraints are very useful, but nevertheless our application should not rely on the assumption that in worst case scenario an SQLException will save the day. Any and all business level validations should be performed in our Java code:
Should we therefore completely move away from the classic integration tests? No. We should still use them, but in reasonable quantities, that will allow for a fairly fast execution of our test suite. In my experience, the appropriate scenarios for full integration testing are:
- so-called "happy path” – a positive scenario showing that all components in all layers can "see" each other and work together properly
- places where business logic is outside of the actual Java code – for example, complex SQL queries that we wouldn’t want to mock
Are you saying that we should never use class-level unit tests? Absolute statements like "never" and "always" shouldn’t be used to often, especially in the programming world. We can use class-level tests to verify our utilities. We often put all sorts of cumbersome logic into *Util classes, which in themselves are not directly related to business logic and often encapsulate technical problems. These classes should be tested separately as they usually have many use cases. It would be highly impractical to test them all with "modular" unit tests.
Like a Spring Chicken!
Okay, but how can we mix all this with using Spring Framework? After all, no one writes code in pure Java these days. We have a Dependency Injection container and we want to use it. Fully agree. Spring gives us a lot of fantastic features and I use it wherever I can. However, its possibilities can be a source of further problems if we start using them too enthusiastically.
Since the popularization of IoC containers, the vast majority of developers are more than happy to throw all their components and services into this giant "bag" so that everything can be magically glued together using @Autowired. This practice has been popularized to such an extent that the use of the keyword "new" in a context other than the instantiation of DTOs or Entities has become something blatant, unpleasant and untasteful for us. But is it a good way to go? Do all classes, services, repositories, factories, providers, and other contraptions deserve to be registered in an IoC container?
Like many others, this practice at first seems to be an embodiment of convenience and flexibility. Unfortunately, in the long run, we put the snares on ourselves. Snare of cyclical dependencies. But how is it possible? After all, Spring protects us from cycles! We are not able to create such components:
Yes, at the code level Spring protects us from such doing things. Although if we use constructor injection – which is the only right way to do it, then java itself will protect us from such a situation, because we will not be able to instantiate such classes. However, I am concerned about far more dangerous cycles. Ones that at first glance are invisible and harmless. The logical cycles.
What is a logical cycle? Every programmer divides the application code into smaller parts. In the context of this article, we separate the logic into Modules. If we register several publicly available components in an IoC container within one module, we are able to cause the following to happen:
Will Spring protest the creation of such relationships? No. From a purely technical point of view, everything is okay. But does this mean that we are not dealing with a cycle? Oh yes, we are. There is an obvious cyclic dependency between our organizational units. Let's remember that our application will grow in time. After a while, the dependency diagram of our components will look similarly to that:
Are cyclic dependencies problematic? Very much so. They are effectively a showstopper for any larger refactoring efforts. Trying to remove, change or add a new module will most probably force you to rewrite a large part of the application.
The situation presented above bares another rather serious problem: As our application grows, we cease to control what is associated with what. We also forfeit the knowledge on how our services should be instantiated. We can grasp parts of the system, but the whole picture is slowly slipping away. This is, if you ask me, a very dangerous situation and quite a big price for the convenience of using @Autowired.
Okay, so is there any solution? Yes, there is. The only component that should be registered in an IoC Container should be the module’s public API.
All it takes is for each module to have its own @Configuration class, which will be responsible for delivering the instance of the API to the Spring container. Using this solution, we will gain two priceless things:
- By opening the @Configuration class, we’ll get all the code we need to create our API. This may seem as not important at first, but once our modules will grow, their instantiation logic may look similarly to this: