You are currently browsing the category archive for the ‘Testing’ category.

I’ve had several conversations with people who seemed overly proud about 100% code-coverage in their unit tests. Obviously, that’s a good thing: the more test cases, the less likelihood of a latent fault existing in the software. But code coverage has its dark side, too. Take a look at this (extremely contrived) C example:

unsigned int noop(unsigned int x)
{
    unsigned int y = x << 4;
    return y >> 4;
}

There are no branches in the code, the cyclomatic complexity is great! In fact, I can get 100% test coverage with a single successful test case:

int main()
{
    return (noop(5) == 5) ? 0 : 1;
}

However, there can be branches in the behavior of that function; in this case, based on the overflow rules for integers. Any argument that happens to use the top 4 bits will get those bits truncated, and will fail my unit test. For this simple function, it is possible to rigorously prove the behavior of the function, so everyone can see that a second test case is required:

#include <limits.h>
int main()
{
    return (noop(5) == 5) && (noop(UINT_MAX) == UINT_MAX) ? 0 : 1
}

Now the unit test demonstrates a failure, without changing the test coverage at all. For real software systems, the reality is that there are two problems:

  • It may not be obvious to anyone working on the team that there are multiple data-dependent behavioral branches. Thus, even with the best of intentions, 100% coverage from a test-first development team may still allow a sneaky bug to slip through the cracks.
  • In many systems, testing is added after the original development. Suppose a company switches to test-first development, and wants to retroactively add unit tests in order to prevent regressions. Or suppose that the original developer of a large, cryptic codebase is gone, and the maintenance programmer decides that the best way to document the gotchas of the system’s behavior is to write unit tests… Then, later development that accidentally changes those gotchas gets caught by test failures. In any case, test coverage by lines-of-code is the easiest way to measure the “do I have enough tests?” metric. The developer is working as fast as possible to get 100% coverage, to be efficient with his/her time (or the company’s dollar). In a case like this, it is extremely likely that the minimal set of tests that show 100% coverage will fail to test significant bits of important system behavior.

Obviously, a piece of code that is never exercised has a 0% assurance rating. Thus, reasonable code coverage (near-100%) is a necessity for assurance, but in itself, does not always provide a high level of assurance. I suppose this is where some “test-case generation” tools come into play. Being able to generate sets of input that cover the “data-dependent behavioral branching” or being able to measure coverage based on parameters passed can be hugely powerful to deal with this sort of problem.

Advertisements