Too much of a good thing: the trade-off we make with tests

Monday, February 5, 2024

I've worked places where we aspired to (but did not reach) 100% code coverage. We used tools like a code coverage ratchet to ensure that the test coverage always went up and never down. This had a few effects.

One of them was the intended effect: we wrote more tests. Another was unintended: we would sometimes write unrelated tests or hack things to "cheat" the ratchet. For example, if you refactored well-tested code to be smaller, code coverage goes down but the codebase is better, so you have to work around that.

It's well known that targeting 100% code coverage is a bad idea, but the question is why, and where should we draw the line?

There are many reasons why you don't want to hit 100% code coverage in a typical scenario. But one that's particularly interesting to me is the value trade-off you make when you're writing tests.

Why do we write tests?

Tests ultimately exist only to serve the code we write, and that code is there to solve a problem. If adding a test doesn't help you solve the problem, it's not a great use of time and money.

The way that tests help you solve problems is by mitigating risk. They let you check your work and validate that it's probably reasonably correct (and if you want even higher confidence, you start looking to formal methods). Each test gives you a bit more confidence in the code that's tested, because it means that in more configurations and with more inputs, you got the result you expected.

Test code itself does not directly deliver value. It's valuable for its loss prevention, both in terms of the real harm of bugs (lost revenue, violated privacy, errors in results) and in terms of the time spent detecting and fixing those bugs. We don't like paying that cost, so we pay for tests instead. It's an insurance policy.

How much risk do you want?

When you pay for insurance, you are offered a menu of options. You can get more coverage—a lower deductible, higher limits, and extra services—if you pay a higher premium. Selecting your policy is selecting how much risk you want to take on, or how much you can afford to avoid.

In the same way that insurance reduces the risk of a sudden outflow of cash, a test suite reduces the risk of a sudden major bug with its direct costs and labor costs. And just like with insurance, we have different options for how many tests we have. We can't afford all the options. We're not going to formally verify a web app1. But we are going to write tests, so we have to choose what we pay in the premium and what we pay when an accident happens.

If you aim for 100% code coverage, you're saying that ultimately any risk of bug is a risk you want to avoid. And if you have no tests, you're saying that it's okay if you have severe bugs with maximum cost.

Detecting when you're paying too much for tests

The question ultimately becomes, how do we select how much risk we want to take on for tests? This is often an implicit decision: someone reads an article that says "more code coverage good" and they add in a code ratchet tool2 and then people start writing more tests because it's our culture, man!

The better way is to be deliberate about the decision. This is something where we as ICs can inform management about the risks and the costs, and ultimately management decides how much to invest in testing and how much risk to mitigate.

Note, however, that there are some tests we have an ethical obligation to write. If you're working on a pacemaker, you have a much higher minimum bar for testing (and other forms of assurance), because your software will kill people if you get it wrong3. It's unacceptable for management or engineers to try to take on that risk. For the rest of this discussion, I'm going to assume that we're above that minimum bar and within the range of risk that it's both legal and ethical to choose from.

Part of the trouble with communicating the risk-cost trade-off here is that it's difficult to quantify. But there are ways that we can make that more clear, and it's worth it to have that discussion to make the trade-off more explicit.

To measure the trade-off, you ultimately need to have two numbers:

  • The cost of writing tests. To get this number, you have to measure how much time is spent on testing. If you have a dedicated test team, their time is all counted in this. You also include the portion of time spent for each task which is spent writing tests. You don't need to measure this for every ticket, just a sampling to get the breakdown.
  • The cost of bugs. Getting this number is more complicated. Some bugs have a clear cost if they cause a customer to churn in an attributable way, but many bugs are more implicit in how they erode trust and produce harms. You can measure the time your engineering team spends on triaging and fixing bugs, and this is one of the primary costs. The rest of it—the direct costs of bugs—you'll have to estimate with management and product. The idea here is just to get close enough to understand the trade-off, not to be exact.

Once you have these two numbers, you can start to back into what the right trade-off is for you. The obvious first thing is that the cost of writing tests should be lower than the cost of bugs, or it's clearly not worth it and you've made a bad trade-off4!

When you communicate those numbers with management, make sure to highlight also that there's the opportunity cost of writing tests instead of code. If your company is in a make-or-break moment it may be a much better idea to go all-hands-on-deck and minimize tests to maximize short-term feature productivity. This isn't a free trade-off, because you'll pay for those bugs later down the road, and it will compound, but for startups with very very short runways, it can make sense.

Another signal that you're making the wrong trade-off is if you can't quantify the cost of bugs because you don't have enough bugs to quantify. That means that you're probably spending too much time catching and preventing bugs, and you should spend more time creating or improving features. (That, or you're not getting bug reports, which is bad for all sorts of different reasons.)


How do you manage this trade-off on your team? Have you made it explicit, or is it implicit?


1

If you are doing formal verification of web apps, please let me know. I'd love to be a fly on the wall and learn more.

2

I'm looking at you, Kyle.

3

This raises an ethical question: if it's wrong to write bad code for a pacemaker because that could kill someone, is it also wrong to write good code for a weapon since that would also kill someone?5

4

For things like pacemakers, the cost of bugs is infinite, so this is always satisfied.

5

Yes, but also, it's complicated. Writing bad code could also kill someone else. The only winning move is to not play (where the game here is "writing code for weapons").


If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts and support my work, subscribe to the newsletter. There is also an RSS feed.

Want to become a better programmer? Join the Recurse Center!