August 23rd, 2007

Are you confident in your test coverage?

A few days ago i was adding some extra logic to one of the modules that i’m maintaining. Unlike the UI-related modules, this one has quite a few test cases that test various rules (about 110-120). And so, as i was adding some code, i caught myself thinking this – instead of proofing the conditions in my head (or on paper) before writing the code, let me just go ahead and write something that looks like it’s a correct thing to do and wait for the test cases to tell me if the code is correct or not. This is wrong on so many levels, and thankfully the test cases that went through this path failed. But this just underscores the imaginary safety in the test-driven development.

Going back about 20 years, the programming practices were much more robust (this doesn’t necessarily mean that there were fewer errors in the resulting code). In most development environments, you had one big mainframe computer and you had your “slot” every few hours to run the latest version of your code. And so, you had to be pretty sure that you ironed out all the obvious bugs before your slot came up, because the next time to check the fixes would only be in three or four hours. The end result is that even with the “harmful” goto statements, spaghetti code and lack of formally defined design patterns, the developers just sat in front of their hand-written (or printed) code and traced all possible execution steps before sending the code to the computer.

Since then, the hardware has become so cheap and powerful that the present generation doesn’t even think about “save-compile-run” cycle anymore. With incremental compilation in Eclipse, you don’t even notice that the code is being compiled (unless you touch code that affects a lot of other places). And so, you might find yourself rushing to code before properly thinking about the design and all the flows. This is especially painful with the test-driven development and agile practices that encourage this style, that i call lazy programming.

I previously referred to this as the imaginary safety in the test-driven development. As long as all the tests pass, the software is working as it should be. Don’t worry about the dynamic typing and the problems that could only be found at runtime – if you have good test coverage, you’ll never get these problems. Which brings me to the question – what is a good test coverage?

Of course, we have these nice tools such as Clover and Emma that produce visually appealing coverage reports. Once you get 100% of the lines covered by your unit / integration / … tests, you’re done, right? Not so fast, and this brings me back to the topic that i studied for quite some time during my last two years in the university – formal verification.

This is quite an interesting and challenging field – given the definition of a problem and a solution, decide whether this solution really solves the problem. This works really nice on hardware (especially VLSI), and is in fact an indispensable tool for verifying the hardware chips (in fact, the FDIV bug is pretty much the only significant commodity hardware bug i heard of in the last ten years). While some of the techniques work on finite-state automata, others have been extended to handle parametrized and infinite domains. However, this still doesn’t scale well from hardware to software.

Unless we’re talking about primitive programs, the problem domain is infinite. And this is especially magnified nowadays with the shift to multi-core systems and distributed faulty environments. Just having 100% line coverage doesn’t mean anything. Not only that, but for more complicated systems you might have a hard time coming up with the correct test cases (expected results); while this is true for the traditional upfront design, it is even more so for the agile “refactor as you go while we can live without explicit business behavior” techniques). “All my tests pass” means exactly that – “all your tests pass”. Nothing less and nothing more. You can cover only so much of the infinite domain with a finite set of test cases.

Not all is lost, of course. Don’t blindly rely on unit tests and code coverage. Think about the code before you start pounding on the keyboard (and hopefully, before you start pounding out those test case skeletons). An interesting approach has been explored in the last few years which tries to address the “state explosion” of real-world programs (applied successfully to the Firefox Javascript engine to find 280 bugs). This, of course, places even more burden on the test layer; since the test cases are randomly generated, it needs to provide a way to save failed test cases and rerun them later for a reproducible scenario.