TDD: How to test something that doesn't exist

Introduction

 

Hello and welcome to the article that will focus on one of the most divisive and controversial methodologies in the Programmer's too box - Test Driven Development. This topic was sort of covered by me in my Scope Management in Java Architecture article. I'm returning to it however, because it is, from the looks of it, a wildly misunderstood technique, and it deserves its own article.


Background

In the beginning years of my professional career I wasn't writing any automated Tests.  I didn't know how to do it and, more importantly, I didn't know I was supposed to. I was under the impression that I should be wary of everything that's going on in my code base to the point that I would be able to make changes and refactor things without introducing bugs. I was very efficient in manually testing my Applications just by 'clicking through them' and I was certain that this was the right way to develop Software. I know now how naive I was, but I must say that thanks to this approach I was able to gain a general code awareness. I had to know which parts of the application I may potentially impact and verify if there are no new bugs.

Around 2-3 years in, I've started to write automated Tests. Some Unit ones, some Integration ones. I remember feeling that enormous relief. I finally had the way to be sure if something works or not, without having to manually go through the application. Things were good. I was certain that I've reached the highest possible tier of Software Engineering. And then I've learned that something like Test Driven Development exists. That people are actually starting their development with creating Tests, followed by the implementation. I was reading about it, and I always was able to understand all the positives coming from it. Nevertheless when thinking about applying it in my own work, I couldn't help but feel a giant


Disturbance in the Flow

We all know the Test Pyramid that defines the ideal ratio of different kinds of automated Tests in the code. According to it, the majority of the Tests should be Unit Tests - the ones that are the most lightweight, fast and isolated. Naturally then the TDD approach should use Unit Tests as its base, and (as the majority dictates) a Unit Test should only cover one Class

These assumptions come with a set of serious implications:

"Design"-fist approach

When creating the actual implementation I rarely start with dividing the logic into small, SOLID-compliant classes and interfaces. The natural flow of work for me is to focus on the implementation to be correct (from the requirements point of view) and to have a good performance. Once that is achieved, I can refactor it, divide into smaller, correctly named and located pieces. Having a good overall architecture principles surely helps with not putting everything into one method/package, but nevertheless, in my world it's the implementation that drives the inner code design, not the other way around. In other words, to be able to tell which classes do I need and how to name them, I need to do the actual implementation. However the TDD, in order to work, is trying to force me to come up with loads of empty interfaces that I'd create Tests for and then implement later. That felt, and feels to this very day an unnatural way of working, at lest for me.  

Interface-per-class fallacy

The assumption of a Test-per-Class paradigm dictates, that every Class needs to be tested in isolation. However, knowing that we need to come up with a Test before the implementation, we should create an Interface that we'd then use to create the Test. After the Test is created, we would then create the implementation that follows. That for me is a gruesome misuse of an Interface concept. Interfaces should be used only in places were we need to abstract things away for some reason. The reason can be that there will be multiple ways of doing one thing, or maybe we need to invoke completely different behaviors in the same way. When we see an Interface in the code, it is a clear hint that there is some kind of abstraction going on in that place. Not every single class needs to be abstracted away. It's good to couple things that belong together. Having too many interfaces has some significant cons:

  • they bloat the code base with unnecessary clutter
  • they actually impact the Compiler, forcing it to load more classes than it needs to
  • they make code navigation a pain because, when performing an analysis of a piece of logic, we constantly need to jump from an Interface to the implementation 
  • they hide away places in the code where the abstraction is actually happening

Implementation-dependent Tests 

One of the best Testing practices is to have the Tests independent from the implementation. What does it mean? In theory, changing the implementation shouldn't force us to update the Tests. Otherwise we loose the protection against regressions and we need more time to make changes due to increased development effort. Is that possible, when using a Test-per-Class approach? Hardly. Classes have dependencies that need to be Mocked/Stubbed during the Tests. Furthermore, when testing single Classes we often have to resort ourselves to using Spies and to verifying things like:

  • how many times a Method was called and with what Arguments
  • has any other Method been called other than the expected one

etc. This is not a good. Having such assertions in the Test code means that we are actually verifying the implementation. And that leads to...

Code immobilization

The Tests end up being literally welded to the implementation. Any change, even as trivial as

  • dividing a Method into couple of smaller ones
  • extracting logic to a new Class
  • changing the Arguments of a Method 

will potentially cause hundreds, or even thousands tests to fail for no good reason. This will make refactoring a long, expensive, tedious and painful process, sucking all the joy out of maintaining and extending the code.


The Revelation

Does this mean that TDD is a failed methodology that should be avoided? Absolutely not! We just need to change one thing - we need to throw out the Test-per-Class approach. Furthermore, we need to understand that the inner structure of the code - with its classes, interfaces - is, as a whole, an implementation. A detail that we should pay no attention to, when creating our Tests.

Once we understand that, Test Driven Development starts to yield great, and sometimes unexpected, results. We can create code that will be a pleasure to use and we can even discover potential high-level requirements errors, before writing even a line of implementation.


How?

By creating the Tests on the correct level - which is the API level. You can read more about my approach to creating APIs in the Scope Management article, "Frontside" section. This time, we will focus on providing an informative Example of how a correctly applied TDD may help us with creating a good, usable code.

Let's say we need a way to process a PDF Document. The logic would be multi-tiered and consist of the following steps:

  1. loading the Document
  2. reading the data of the Document
  3. cleaning up the data from the Document into a desirable format
  4. storing the Result of the processing

This is an example of a non-trivial process that is very similar to the one I had to model not so long ago. In the traditional implementation-first approach, as well as in the incorrectly applied TDD one, we would have to have a huge brainstorm and analyze dozens of different details like:

  • which library do we use to read the PDF
  • do we want to use multiple threads to read and cleanup the data?
  • where and how will be store the result? do we use RDBMS? Will we use JPA or plain SQL?

In reality, there is only one question to be answered in the beginning of every such design process:

How do I want to interact with it and what should it actually do? 

In my case, I wanted to have a flexible, non-blocking API that I could freely extend in the future. I began with writing an empty Test case:

public class DocumentProcessorSpec {

    private DocumentProcessor underTest;

    @BeforeEach
    public void setup() {
        underTest = new DocumentProcessor();
    }

    @Test
    public void shouldProcessPdfDocument() {
       
    }
}

 

And then I've created my dream API that I wanted to *:

@Test
public void shouldProcessPdfDocument() {
	// given: There is a PDF File to be Processed
	List<ProcessedRow> result = new ArrayList<>();
	String path = "test-files/test-file.pdf";
	Process process = underTest.createProcess(ProcessBuilder
			.create()
			// and: we load the file from the File System
			.loadFile(ProcessFileLoader.fromFile(path))
			//and: we read all pages of the File
			.read(ProcessFileReader.allPages())
			// and: we clean up the File with Trimmer
			.cleanupWith(ProcessFileCleaner.trimAll())
			// and: we remove Header and Footer from the Data
			.cleanupWith(ProcessFileCleaner.removeHeaderAndFooter())
			// and: we store the result in a Collection
			.sendTo(ProcessFileResultHandler.collector(result))
	);

	//when: the Process is started
	process.start();

	//and: we wait at most 10 seconds for the Process to finish
	Awaitility.await()
			.atMost(10, TimeUnit.SECONDS)
			.until(process::isFinished);
	
	//then: the result size is as expected
	assertEquals(EXPECTED_RESULT_SIZE, result.size());
	
	//and: the result is correct
	verifyResult(result);
}


It was really easy, because I was not limited by any existing implementation. Also, while creating the API I didn't have to think about any implementation details. Notice, that the API doesn't define which library I have to use to actually read the PDF File.

Also, during the API creation, I've discovered that I don't want to couple it with any Persistence Layer. Instead, I created a ResultHandler that I will be streaming the resulting data into. It will be up to the Handler to decide what do do with it - collect it to a List, store in a Database, or send if for further processing. I don't know, and I don't want do decide now. Were I to make such a discovery while using the Test-per-Class approach, I'd risk loosing hours of work already poured in to creating all the Interfaces and Tests.

In the process of defining the API, I've created a bunch of classes that represent different configuration utilities for my API to be as flexible as possible. I was able to focus on their proper, explicit and descriptive naming, so that the API is natural to use. The Test is ended with typical assertions, specific to the tested File. 

The Test will, of course, fail at this time. That is the whole point. Now that I have it, I can begin to think about how to do the actual implementation. Because the Test doesn't assume anything on that part, I will be free to change the internal design on the fly, without having to worry about changing/removing already defined Interfaces and their respective Tests.

If, in the future, my implementation turns out to be inefficient, I will be able to refactor it with confidence, that I have my Tests to verify, whether the requirements are still met after the code update.

Conclusion

It has taken me very long time to understand the essence of TDD. It's crucial, because if incorrectly applied, TDD has a potential to turning programmer's life into a living hell filled with forced, inverted workflows, weird unnatural compromises, and hours spent on fixing Unit Tests after every small implementation change.

On the other hand, if applied correctly, TDD can help you define the expected behaviors of your code, without assuming anything on the implementation part. That's why I think that the only valid TDD is actually BDD - Behavior Driven Development.

Do I use it always and for every piece of code I write? No. I think it is a great Tool that allows me to define good APIs and verify the requirements. Once that's done, I tend to use a mixed approach and rely on my gut feeling. Sometimes I start with creating Tests, sometimes I will do implementation first. It all depends on how much do I know about the context and the usage of the thing I'm building. It also depends on the complexity. I won't use TDD to create some simple CRUDs. However, if I need to define a more complex, sophisticated logic, TDD is always the way to go.

Thanks for reading!


* In a real-life scenario, I would create many such Test cases, covering all possible branches of the logic. In here I wanted to picture how one of them could potentially look like.
The API created in the example was my dream API. After adding some static imports it reads almost as a sentence. Yours could look completely different. The important thing is that you'd be able to freely define it.

Comments

  1. Hi Slawek! Another great article. I wish I read articles like that when I was junior dev learning TDD. I had to learn it thru pain, exactly the way you described in this post. I cannot agree more with you. One another rule I try to follow when using TDD - tests shall be as close to production as possible. For example, if there is a way to have the fast DB similar as in prod and use it - I go with that in tests, no matter how we call it - unit or integration. Same with other technologies used in prod. Otherwise all my tests can be green but the whole application fail miserably in prod :) Once again, thanks for the post!

    ReplyDelete

Post a Comment