Testing

Might not mean what you think it means

May 26, 2025

I’d like you do stop and thing about the following question.

Why do you write tests?

Seriously. Stop reading, sit back, and think about it for a couple of minutes. Maybe jot down an item or two on a bullet list. Do it now, and I’ll go make a cup of tea.

What did you come up with? Here are some of the reasons I’ve heard over the years. First, come two sad ones:

I have to write tests: they are part of our project standards.
I feel I should write tests because everyone says I should.

And then some perfectly good reasons:

I write tests to verify that my code works.
Tests mean that I can more safely change my code in the future.
Tests help me explore my APIs before I use them elsewhere in my system.
Tests drive me to write code that is more decoupled.

Each one of these is a credible assertion. Each seems like common sense. Who wouldn’t want to know if their code works? Who doesn’t want to write code in more isolated, decoupled chunks?

The problem is that assertions, particularly ones that seem to be virtuous, have a nasty habit of becoming accepted facts.

Personally, I don’t think something deserves to be a fact unless it can be tested.

Testing Testing

Let’s think up some experiments to try to validate whether the four testing assertions above are actually true. But, before we do, a warning. Truth is almost always contextual. Most people would assert that stabbing someone in the throat is always bad; but it’s good if you’re performing a tracheotomy. In all the discussions that follow, I’m going to have to generalize, because I don’t know your specific context. That’s why I’ll be suggesting ways you could experiment with the testing assertions personally, in your own context. Anyone who tries to sell you on a universal do-this-rule is either stupid or fraudulent.

Assertion: I write tests to verify that my code works

This is an obvious assertion: verifying functionality is pretty much the definition of testing. But, as with all assertions, we really should test to see if it is true, but how?

Here’s what I did a while back. I first made sure that I had a handle on the number of faults that surfaced in the code that I delivered. This code had gone through my normal testing process: it wasn’t 100% coverage (because that’s kinda dumb) but it was decent.

Once I had the baseline, I stopped writing unit tests.

Some of the impacts that I measured I’ll be talking about when we come to the other assertions. Looking just at bugs, however, I discovered two main things:

Many of the bugs that would have been found by testing were actually found instead as I was using the buggy code in other code. In every case I encountered, the source of the bug was pretty obvious: I can’t remember triggering a bug that was more that 2 levels of call away from the place where it impacted me.
To my surprise, the failure rate of the system once delivered was slightly higher in the first few days, but then fell back to about the same levels I’d seen previously in tested code.

I looked into the failures in the delivered code, both before the experiment and during, trying to work out why there was so little variance.

It turned out that most of these bugs had the same root cause: I had misunderstood something about the application while I was writing it. I hadn’t realized that a value might be missing, or that a key could be duplicated between two data rows. And, because I didn’t know about these things, I wouldn’t have written tests for them in the first place.

Conclusion: for me, on the kinds of projects I’m writing, religiously writing tests seemed to have little impact on my delivered error rate.

Assertion: Tests mean that I can more safely change my code in the future

This assertion seems very plausible: having a good suite of tests reduces the chance of introducing regressions while making future changes.

To test this out, I had a look at what happened when I came to update the code I’d written without tests.

This is anecdotal, but I found the lack of tests made changes considerably more stressful. For every change I made, I’d need to manually explore the future trajectory of any affected data, making sure that my modifications didn’t break any assumptions elsewhere in the code.

In the end, I found myself writing canary tests: functional-level tests that verified the behavior not of the changes I made, but of the code that relied on that changed code.

On the face of it, the assumption that tests are a benefit during code updates would seem to be valid. However, there’s another side to consider. This is not something I’ve explicitly tested, but instead comes from my experience.

Over the years, I’ve learned the hard way that it’s a mistake to simply add new chunks of code at some convenient point in an existing codebase.

When the app was first written, I was constantly refactoring the code; any friction adding new code was a hint of some kind of problem with the existing codebase. One of the joys of decoupled code is that it is easier to restructure, as changes are more likely to be localized.

However, what happens when I come to modify that code later on? Again, adding the new code could prompt a refactoring of the existing code. But if that code has a lot of unit tests (and I’m not proud of this) I just couldn’t bring myself to update both the existing tests and existing code, and so I just carbuncled the change in where it could fit. My laziness meant that the tests would tend to lower the quality of the app over time.

Conclusion: Overall, and for me, on the kinds of projects I do, the idea that having tests makes future changes less fraught seems to be true. As a result, although I may not have a full test suite when entering in to a change, I’m likely to add some high level tests before I actually make that change.

Assertion: Tests help me explore my APIs before I use them elsewhere in my system

The idea behind this assertion is that your test will be the first time you call the code you just wrote (or are about to write for you Test-First folks). This gives you a fresh perspective on that code, and often that perspective gives you hints that things aren’t quite right.

A key indicator (for me) is when the tests are tricky to write because they require setting up a bunch of context: your format_name function takes a User object as a parameter, so each test must go through the pain of constructing a valid User before it can call format_name.

When that kind of thing happens to me, I stop and wonder why my formatting function needs the entire user object: could I just pass in the first and last names, and perhaps a title? Or maybe this friction caused by writing the test is saying that the various name fields shouldn’t just be separate attributes of a user; instead I might need to have Name objects which User could reference.

This assertion and the next were my personal go-to reasons for writing tests. It turns out that what I learned about this API assertion mirrors my discoveries about the decoupling assertion, so I’m going to defer my conclusion until the next section.

Assertion: Tests drive me to write code that is more decoupled

Code that has a lot of coupling is very difficult to work with, because you can never fully know the scope of a change you make.

It turns out that unit tests are a great way of discovering some types of coupling. If you want to test function A, but to do so you need to construct half a dozen new objects, involving calls to twenty other functions, they you’re being forced to live through the pain imposed by A’s degree of coupling.

So, just as with APIs, a test that is difficult to write because the thing under test requires a lot of context is a red flag: you might get the test to run, but underneath that success you know that the code it is testing smells.

The thing I feared most when I started my experiment of not writing unit tests was the damage it might do the the design of the code I was writing. I worried that coupling would sneak in if I had no tests to keep the code simple.

In practice, I discovered something surprising. And, before I go on, I have to confess that, of all my observations in this experiment, this is likely to be the most personal and least applicable generally.

Anyway, to my surprise, I discovered that, although I was not writing unit tests, I was constantly thinking about writing them. “How would I test this” was running in a kind of loop all the time I was coding. It turns out that just considering that question was enough to trigger the doubt that would end up driving the refactoring I needed. In effect I was getting the design benefits of testing without actually writing the tests.

I think part of the reason for this is that I had spent decades before the experiment doing various kinds of testing, and the various facets of creating tests and become part of my tacit brain. (This is also what happens when you learn to drive. Initially, you have to think about everything you do, but after a while it becomes almost automatic: the tacit side of your brain has developed the reflexes and intuitions to handle most of the act of driving for you.)

My experience was that I was refactoring code just as often when I developed with tests as when I developed without. Looking back at code written without tests, I really don’t see too much I’d want to change in terms of structure.

Conclusion: I thing that thinking about testability is a key tool when it comes to improving your APIs and your code structure. In my personal experience, though, I find that I get that benefit even if I never write the tests; just thinking about them is enough.

So…

I am not saying that tests are good or tests are bad. I’m not saying that testing makes your code better or makes worse, easier to maintain or harder.

All I’m saying is that I performed the experiment, and came up with some personal conclusions that inform what I do now.

I still write tests, but I focus in on areas where the cost of the code being wrong are greater than the costs of creating and maintaining the tests. When I’m writing code that deals with money, or with information that has personal value, I write unit tests. I will write higher level functional tests that exercise a whole chunk of the app (does importing this spreadsheet cause the correct royalty adjustments to appear in that table?). And if one of those high level tests fails, I might explore with some lower level tests until I’ve isolated the issue.

I make a point of maintaining the “how would I test this” inner dialog when I’m coding, and I make time to react and refactor when the question has no easy answer.

But that’s me. I have no idea what you should do. Except…

If you’re just starting out, write tests. Keep doing it until in becomes reflexive. This probably takes two to five years.
If you’re working in a domain where the cost of failure is high, write tests. It is always going to be more expensive to write this kind of code, and the tests won’t be as costly in relative terms.
If you’re exploring, write tests. I do this a lot, and I find that the series of tests becomes a great reference; I look back on them to remember what I’ve learned.

More suggestions:

If you write exploratory tests, delete them before delivering the project.
If you’re a TDD person, and the strictly follow the no-code-without-a-failing-test mantra, do back and delete all the silly tests which have nothing to do with the application functionality.

Tests are code, and the more code you’re carrying around in a project, the more dependencies you have, and the more that can go wrong.

At the end of the day, I’m simply suggesting that you don’t have to do something because people tell you it’s a good idea, or because you yourself assume that it’s the right thing to do.

Test you assumptions, and refactor what you believe accordingly.

Alastair Danley-Cox

May 27

Reminds me of debates I've had about what is a unit test / integration test / end-to-end, yada yada test. I try to call them microtests these days (stealing a term I heard from Gee-Paw Hill) to avoid that conversation as at the end of the day I think it's probably all subjective, and doesn't lend itself to what matters more.

So when you've mentioned tests that test a wider range of functionality (I think of this as black-box testing, where I don't care how it is actually working inside), it struck a chord with me. This again appears to be about the "width" of the test (this is often the sticking point in discussions around what is a unit test etc.) - I've worked places with a purist test approach where we were actually testing the getters and setters of a class, for example.

I look back and think, sure they were valuable, but at that level of granularity, whether or not the correct definition of a unit test, at some level they become cumbersome, and I'd rather "widen" the microtest and test the class as a black box, unless there are some particularly interesting methods associated with it. And the chances of that depend on what sort of job the class is supposed to do (say, DTO vs. domain behaviour)

I could then decide, you know what, I've widened this test for pragmatic reasons, and because I don't see as much downside as I do in their maintenance. it's all good, it adds value, and is exactly what I need right now.

I don't really see why widening those tests even further, say from a single class treated as a black-box, to a family of classes all treated as a black-box, is a problem - that is to say, I would not write the smaller tests that tested the innards, just the wider ones that test composites of classes etc. And if at some point, I see a need to change that, to write a smaller test, then have at it.

Thanks for your insights :)

Expand full comment

Kerry Langford

Jun 7

:-( I’d like you do stop and thing about the following question.

;-) I’d like you to stop and think about the following question.

:-( Conclusion: I thing that thinking about testability is…

;-) Conclusion: I think that thinking about testability is…

:-( do back and delete all the silly tests…

;-) go back and delete all the silly tests…

:-( Test you assumptions, and refactor what you believe accordingly.

;-) Test your assumptions, and refactor what you believe accordingly.

:-( Won always halve to chuck you sell cheeker because I thing it off ton does note slay what you lean.

;-) One always has to check your spell checker because I think it often does not say what you mean.