Oct 19, 2024 11 min read

We gush about unit testing without understanding software testability

Photo by Luca Bravo / Unsplash (This is used as thumbnail, don't mind it too much)

In software development circles, it is a well known fact that software testing, specifically writing automated tests, is paramount to ensure your application works as intend and deliver on its promises.
In fact, the web is riddled with articles highlighting the importance of unit tests and their benefits, often supported by personal anecdotes. And in all fairness, roughly 40% of the total time and costs are spent on software testing^[1].

Unfortunately, no matter how much people gush about the importance of unit tests (usually as the sole acceptable level of automated testing, forgetting integration and end-to-end tests entirely), testing is usually considered a "dark art" of software engineering^[1:1].

However, rare are the articles pointing out practices to write code that is testable. Not only that, but also how to measure to what extent a piece of code (or software artifact) is testable. This measure is called testability and is much more explored in scientific literature. In fact it originates from a 1984 study from Robert V. Binder. He defines testability as:

The relative ease and expense to revealing software faults.^[2]

Ensuring testability requires knowledge, rigorous development processes (such as Test Driven Design) and thorough verifications by software integrators and maintainers. More than often verifications are overlooked in favor of implementing required features as soon as possible.

We've all been there as software developers: a higher up asks you to "ship, ship, ship" because the deadlines have been pushed too close to comfort, and you are in the middle trying to finish the required features as soon as possible. Testing their robustness is not an option in that case so you just hope they will not cause massive problems in the future, without any guarantees.

Traditionally, in literature, testability is articulated around two dimensions: metrics and patterns. Metrics are used to give a quantitative indication on testability and assess the difficulty in making tests for a software artifact. Multiple metrics were proposed to measure testability, for instance the number of line of code, number of operators or cyclomatic complexity^[3]. Unfortunately these metrics do not give a granular evaluation of testability, rather a global value, limiting its usage in a day-to-day software development process.

Another approach is avoiding a set of anti-patterns that hinders testability without necessarily being a bug^[3:1]. This empirical approach is much more suitable to software development processes.

This is the crux of the issue. For a software developer, metrics are great when auditing a software overall but they can't help you specifically write more testable code. We will focus on identifying these anti-patterns and how to refactor / avoid them.

When your code smells

All the patterns we will go through are Code Smells. If you're familiar with Martin Fowler's Refactoring you know they are "a surface indication that usually corresponds to a deeper problem in the system"^[4]. And you would be right!

We all write code smells from time to time. And when refactoring (if it happens, mind you), we usually get rid of most of them. But sometimes these code smells are ingrained in the code, or in our habits, and we don't necessarily know how to refactor.

Even when following test driven design, the smells written might reach an edge case introduced directly by the anti-pattern we didn't think about in the testing phase because it isn't part of the conception. This goes unnoticed, possibly in production, and, when this edge case arrives, the feature breaks and we need to fix the problem ASAP.

But what if we didn't write them in the first place? This requires learning and identifying these anti-patterns and the early signs you are on the wrong track.
So, let's take a look at some anti-patterns to avoid in our daily work, shall we?

The Catalog™

Let me introduce you to The Catalog™. It is an on-line catalog of Code Smells by Marcel Jerzyk and Lech Madeyski. This taxonomy is, by far, the most comprehensive list of code smells I had the pleasure to stumble upon. I will highlight some code smells from this catalog that specifically hinder testability for this article, but I highly encourage you to check it out.

My only grief with this catalog is the amount of smells applicable only to Object Oriented Programming. But that's to be expected considering it is one of the most popular paradigm in the world at the time of writing.

Without further ado, let's dive into some of these smells.

Side Effects

When writing functions it can be tempting to write one that does too much. More specifically a function that tells it does something while having other responsibilities. Take a look at the following example in C.

Here we want a function to write an image to a disk and returns 1 if there was an error, 0 otherwise.

int write_image(point3D *projection, size_t projection_size, int width, int height, 
                 float occlusion_threshold, char *path) {
    image_content *content = generate_content_from_projection(projection, projection_size, width, height);
    FILE *fp;
    
    if(occlusion_threshold > 0) {
        apply_occlusion(content, occlusion_threshold);
    }
    
    
    fp = fopen(path);
    
    if (fp == NULL) {
        printf("Error in opening file %s\n", path);
        return 1;
    }
    
    for(int i = 0; i < width * height; i++) {
        fprintf(fp, "%c%c%c", content->pixmap[i].red, content->pixmap[i].green,
                content->pixmap[i].blue);
    }

    free(content);
    fclose(fp);
    return 0;
}

However, this function has multiple side effects:

It generates the content of the image from a UV projection of 3D points.
It calculates occlusion for the image

It clearly breaks with the Single Responsibility principle. This function doesn't tell you that, you have to actually look at the body and signature to get that information.

This adds new cases during tests: What if we set an occlusion threshold? How can we test for the projection to be faithful to what we're suppose to write? etc.

This function is also really bloated with a lot of functionalities compared to what it's supposed to do (i.e. write an image to disk). A solution is to extract each functionalities to allow testing for each responsibility alone:

int write_image(image_content *content, char *path){
    fp = fopen(path);
    
    if (fp == NULL) {
        printf("Error in opening file %s\n", path);
        return 1;
    }
    
    for(int i = 0; i < width * height; i++) {
        fprintf(fp, "%c%c%c", content->pixmap[i].red, content->pixmap[i].green,
                content->pixmap[i].blue);
    }

    fclose(fp);
    return 0;
}

int main(int argc, char *argv[]) {
    if(argc != 2) {
        printf("Unrecognized arguments);
        return 1;
    }

    image_content *content = generate_content_from_projection(PROJECTION, PROJECTION_SIZE, WIDTH, HEIGHT);
    
    if(OCCLUSION_THRESHOLD > 0) {
        apply_occlusion(content, OCCLUSION_THRESHOLD);
    }
    
    int exit_code = write_image(content, argv[1]);
    free(content);
    
    return exit_code;
}

Then we can use this function alongside generate_content_from_projection and apply_occlusion and tests their responsibility individually. Testing write_image now only requires to test if the image is correctly written to disk.

More information on this smell here.

Fallacious Function Names

To test a function we need to know what it does and doesn't. Correctly naming is paramount to ensure testability as people writing automated tests on your code will be able to quickly identify on what your function should be tested and which edge cases should be taken in account.

For example, the following Haskell function pretends to return a lazy loaded infinite list of right triangles.

rightTriangles :: (Integral a) => a -> [(a,a,a)]
rightTriangles maxLength = [ (a,b,c) | c <- [1..maxLength], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2 ]

But wait, what is that maxLength parameter you ask? Well... This function lied to you: it's not returning an infinite lazy list of right triangles, rather a list of right triangles with sides between 1 and maxLength. It is misleading because the responsibility of the function is not what it's supposed to be, which changes how you use it in other code and what you should be testing.

Renaming the function to explicitly show its responsibility is the solution to this smell:

rightTrianglesOfMaxLength :: (Integral a) => a -> [(a,a,a)]
rightTrianglesOfMaxLength maxLength = [ (a,b,c) | c <- [1..maxLength], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2 ]

Everything that obscures the true logic behind a function hinders testability drastically.

More information on this smell here.

Mutable Data

Mutability in data can create side effects in other part of your code. Specifically passing around mutable data between functions can cause unpredictable side effects when the value changed in some part of the code in a way the current function didn't intend to. Mutability makes testing especially tricky: it forces to mock the mutable data to certain values for the tests to pass, and adds unnecessary overhead when taking edge cases in accounts.

The following C function computes the histogram normalization on a gray-scale image using its histogram, its intensity matrix (thereafter graymap) and size (rows * cols) and updates the histogram after normalization.

void histogram_normalization(image_file *content, histogram_t *histogram) {
    int max_intensity = histogram_max(histogram);
    int min_intensity = histogram_min(histogram);
    for (int i = 0; i < content->rows * content->cols; i++) {
        unsigned char intensity = content->graymap[i];
        float new_intensity = content->max * (intensity - min_intensity) /
                              (max_intensity - min_intensity);
        content->graymap[i] = (unsigned char)roundf(new_intensity);
    }

    free_histogram(histogram);
    histogram = compute_histogram(content);
}

Here we see the biggest problem of the function: this function modify its parameters histogram and content to the new normalized histogram and modified image, respectively. It's a problem because now we can't use the histogram computed first on the image: it's impossible to recompute it since the image is modified, but also it's impossible to plot the histogram before and after normalization if we choose to do so. Worse, this make testing for expected value computationaly expensive and a nightmare, especially if some other function needs the histogram normalization in their body.

The solution to that problem isn't easy and one sided: if we want to keep every value separated and immutable we need to keep the histogram binded in the image_file structure and generating an entirely new image to return with the function histogram_normalization. This is a solution to our unwanted mutability, but it's not the only solution. The following code implements it:

// Passing content as a value instead as a reference to avoid modification. 
// It's a huge change that might not be suited to every code bases
image_file *histogram_normalization(image_file content) { 
    int max_intensity = histogram_max(content.histogram);
    int min_intensity = histogram_min(content.histogram);
  
    // Allocate a new image to be returned by the function
    image_file *modified_image = allocate_image(content.cols, content.rows, content.max); 
  
    for (int i = 0; i < content.rows * content.cols; i++) {
        unsigned char intensity = content.graymap[i];
        float new_intensity = content.max * (intensity - min_intensity) /
                              (max_intensity - min_intensity);
        modified_image->graymap[i] = (unsigned char)roundf(new_intensity);
    }
    
    // `compute_histogram` could be change to modify the histogram in the structure directly.
    // But that would add more unwanted mutability
    modified_image->histogram = compute_histogram(modified_image);
    
    return modified_image;
}

Mutability is tricky because its side effects are everywhere and seemingly invisible until another part of the code accesses the mutated data. Reasoning about these variables is complicated and can get out of hand quickly.

More information on this smell here.

Feature Envy

In object oriented programming, we encounter sometimes a class that uses methods and fields of another class more extensively than it should. This is tricky because it breaks testability: The class we are trying to test needs another class to be mocked so we can do some isolation. Take a look at this example in Java:

class User {
    String getAddress(ContactInfo contactInfo){
        return String.format("%s %s, %s, %s", contactInfo.streetNumber, contactInfo.streetName, contactInfo.city, contactInfo.state);
    }
}

class ContactInfo {
    String email;
    String phoneNumber;
    String name;
    String state;
    String city;
    String streetName;
    String streetNumber;
}

Here the User class depends so much on ContactInfo. So much so it uses every single attributes. This example is, for all intents and purposes, egregious, but you get the idea: User has a Feature Envy of ContactInfo. In that case, it is better to merge both classes during a refactor:

class User {
    String email;
    String phoneNumber;
    String name;
    String state;
    String city;
    String streetName;
    String streetNumber;

    String getAddress(){
        return String.format("%s %s, %s, %s", this.streetNumber, this.streetName, this.city, this.state);
    }
}

Like this we don't need to mock data from ContactInfo into User while testing, rather just define a good and complete User with its own responsibility, ensuring we can test it thoroughly.

Of course, depending on the complexity of the User class you could have simply extracted the method to get the address into ContactInfo, as such:

class User {
    // Content of User
}

class ContactInfo {
    String email;
    String phoneNumber;
    String name;
    String state;
    String city;
    String streetName;
    String streetNumber;
    
    String getAddress(){
        return String.format("%s %s, %s, %s", this.streetNumber, this.streetName, this.city, this.state);
    }
}

Refactoring isn't a set of one-fit-all solutions, rather a process of taking the best course of action to make your code more scalable and, in our case, testable.

More information on this smell here.

Addendum

Most of the examples are really simplified to make my points. Especially examples that take writing something to disk will have far more edge cases to take in account, especially in the C examples. Still, I wanted to avoid using trivial examples, instead some with subtleties, because professionals very rarely write blatantly smelly code. The Java example might be the most trivial of these.

Besides, most examples you can find are really inorganic and seem easy to spot while, in reality, smelly code is smelly because it works but shows a problem in conception. Spotting programs unable to compile is easy, spotting programs which can generate runtime bugs and inefficiencies due to a deeper problem in conception is complicated.

I hope the effect I was hoping for hits home for some of you readers.

Going further on testability

All the sources I used are either directly referenced in the article body or in the footnotes. To get a grasp of testability on object-oriented systems, I highly suggest to read Binder's study Design for testability in object-oriented systems.

If you're interested in refactoring principles and clean code, you can't miss Refactoring: Improving the Design of Existing Code (3rd Edition) by Martin Fowler and Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin.

This article didn't get into testability metrics to audit your software artifacts, and if you are interested in that, and speak French, you could read Shaheen's study Validation de métriques de testabilité logicielle pour les programmes objets.

Checking out The Art Of Software Testing, 3rd Edition by Glenford J. Myers et al., would be a really good idea if you struggle with making coherent and relevant tests, it is a good starting point if you're unsure about your abilities when it comes to software testing.

And finally, you should keep The Catalog™ close to you when refactoring, at least if you're unfamiliar with the practice. It is a pleasure to work with since it is so well made and the taxonomy is really comprehensive.

Closing thoughts

I've had the pleasure to work on software testability during my Internship in June 2023 at the Laboratoire d'Informatique de Grenoble (LIG-UGA) for my first year of Master with Lydie du Bousquet. If you are familiar with what I wrote earlier, it's because you read my internship report! This report was unfortunately not published as it was mostly used as a grading tool for the Internship but I think the content is really interesting.

I don't know if I can publish it on my own accord so I'm using my previous work to write this article.

Knowing how to identify these code smells and avoiding them as much as you can will save you time during testing and you'll thank yourself for thinking ahead of time!

That's all for today, until next time!
Cheers,
Virgil

Glenford J. Myers, Corey Sandler, and Tom Badgett. The Art Of Software Testing, 3rd Edition. John Wiley & Sons, Hoboken, New Jersey, 2011. ↩︎ ↩︎
Robert V. Binder. Design for testability in object-oriented systems. Communication of the ACM, 37(9):1, September 1984. ↩︎
Muhammad Rabee Shaheen. Validation de métriques de testabilité logicielle pour les programmes objets. 10 2009. ↩︎ ↩︎
Martin Fowler. Refactoring. Addison-Wesley Signature Series (Fowler). Addison-Wesley, Boston, MA, 2 edition, 2018. ↩︎

When your code smells

The Catalog™

Side Effects

Fallacious Function Names

Mutable Data

Feature Envy

Addendum

Going further on testability

Closing thoughts

Sign up for Edelweiss Grove