[Navigation Bar]  
 
 
The Lone Coder
Reflections for the Unsung Linux Saviours
by Ken O. Burtch
 
 
[Lone Coder]

 Why and When To Use Test-Driven Development Effectively

"Barfield never made me an Anthroposophist, but his counterattacks destroyed forever two elements in my own thought. In the first place he made short work of what I have called my "chronological snobbery," the uncritical acceptance of the intellectual climate common to our own age and the assumption that whatever has gone out of date is on that account discredited. You must find why it went out of date. Was it ever refuted (and if so by whom, where, and how conclusively) or did it merely die away as fashions do? If the latter, this tells us nothing about its truth or falsehood. From seeing this, one passes to the realization that our own age is also "a period," and certainly has, like all periods, its own characteristic illusions. They are likeliest to lurk in those widespread assumptions which are so ingrained in the age that no one dares to attack or feels it necessary to defend them."

 

-- C. S. Lewis, Surprised by Joy
   (Quoted from Wikipedia)

Recently astronomers have claimed more than 450 stars have planets around them. Yet a few years ago, scientists said that we lacked the technology to find planets around other stars. What happened?

[Night Sky - Free Clipart Island]

As I talked about in The Open Source Guide to the Solar System (Lone Coder August 2006), what exactly a planet is and the science to locate one is rather tenuous. In the local neighbourhood, around 85% of stars are small, flare irregularly are have multiple stars. These make planets unlikely, and certainly planets like the Earth we know impossible. The Sun, the Earth and our solar system are true stellar rarities (Nearby Stars, Wikipedia).

Most of the research has focused on trying to find hypothetical giant planets, including "brown dwarfs" (that is, failed stars) and "hot Jupiters" (huge planets extremely close to their stars). But the evidence for these 450 planetary systems is sketchy and further research have begun rejecting some them. Some close by stars wobble but no planet has been viewed, meaning the wobbling could have a different cause (GJ_412, Wikipedia). Other claims of planets in the nearest systems have been refuted or questioned--for example, Bernard's Star and Lalande 21185. In one case the "planet" was an object in the background. And some of the evidence of the defunct Spitzer satellite is dubious—such as planets with hot sides that don't face their stars (Jet Propulsion Lab, Oct 19, 2010). Recent computers models suggest that "hot Jupiters" cannot actually exist: the forces of the star would tear a close giant planetary partner into pieces (NASA article).

Extraordinary claims require extraordinary proof. In the case of the search for planets outside our solar system, the evidence is early, sketchy and needs collaboration .

This month I'm investigating Test-Driven Development (TDD). What I thought was a simple software practice...essentially "test then code"...turned out to be the longest, most complicated Lone Coder article of the year. I'm going to do my best to describe here what TDD is, what it does, what it doesn't do, and what value it brings to a software project. This is not an easy task because there are many claims made about it: it makes better software, it saves time or even it's fun. Is their solid evidence that TDD is a breakthrough?

 TDD: What is it?

Test-Driven Development (or sometimes Test-Driven Design) is a software process that became popular around 2003. Unit tests are low-level, code-level "pinhole" tests that focus on the validating how an isolated piece of a program. TDD followers write a single unit test before any programming is done. Then they write only enough of the program to pass that one test. They continue in this way, writing a simple test and doing enough work to pass that test. Because there's no up-front design, a TDD practitioner assumes he will have to perform major rewrites as needed, but never proceeding to new functionality until all the existing tests pass.

In a sense, TDD is a process of continual low-level regression testing. A regression test suite is a set of automated tests that you run to make sure the application still functions correctly after a change (to verify its previous capabilities didn't "regress", or break, after the change). Unlike regression tests, TDD stipulates that new tests must be created before writing new programming, and the test suite must be run after every change, no matter how insignificant.

Some people, like Eric Shupps, believe that TDD is about good unit testing (SPTDD: SharePoint and Test Driven Development, Eric Shupps, BinaryWave). TDD is not unit testing: it a process that stipulates where and when these tests should written and run. Unit tests can be used without the TDD approach. When TDD is used, a test is the first step in developing new functionality. If you write a unit test in parallel with new functionality, or otherwise after starting new functionality, or creating more than one test up front, then you're not following the TDD approach. You can write good tests without using TDD. This is why TDD and good testing are not the same thing.

Because TDD links test writing and functionality writing, code coverage tools are commonly used to verify the test coverage. You don't need TDD to use coverage tools, but if you practice TDD, they are an implicit requirement.

Let's take a look at some of the other claims about TDD.

 Claim 1: TDD Eliminates All Documentation and Training

Some TDD advocates believe that TDD provides ample unit tests and the unit tests document what the project is supposed to do. So documentation is unnecessary.

Let's face it: many developers aren't very good at writing documentation. They find writing in simple English sentences harder than writing source code. Documentation isn't fun. This is one reason documentation is discarded during development.

Another reason is a concern over deadlines. Agile process tries to avoid unnecessary documentation, but some developers view all documentation as unnecessary waste and try to do none at all. The software will run regardless if documentation exists or not and they don't see the bigger issues of team communication and training as important.

Good, useful documentation expresses the "why's" of a project, or the impact of the project in the real world. Such docs can include the goals of a project, tutorials, design rationals (why a particular solution was chosen), non-technical overviews and so on. Unit tests are programs, and like any source code, do not fill the need for these kinds of useful documentation (Agile people still don't get it, Cedric Beust).

Along the same lines, some TDD advocates say that training is no longer necessary: just look at the (thousands of) unit tests to learn how the software works. This kind of argument is used without TDD or even without unit tests at all. "Just read the code." The audience of unit tests is the source code, not people living in the real world dealing with issues not related to the programming.

Unit tests seem ill-equipped to replace documentation and training.

 Claim 2: TDD Eliminates All Up-Front Design

If design is about meeting requirements, then creating unit tests up-front is like a yes or no question: does the program meet this requirement. As you create a test and make the program pass it, the theory is that your tests dictate the design of the project.

In "Facts and Fallacies of Software Engineering", veteran researcher Robert L. Glass quotes studies that show that most project delays come from insufficient requirements gathering and changes in requirements as a project progresses. It's not surprising that some developers believe that if perfect requirements are difficult to obtain, then planning, like documentation, is a waste of time. They believe that you should just get on with the business of writing code and refactor when you hit impassible roadblocks—that's what you're going to have to do anyway. However, Mr. Glass also explains that the requirement problems are best handled as early in the project as possible. The later you redesign, the more work it will require.

By the famous 80/20 rule, 80% of the delays for a project are caused by 20% of the problems. By eliminating all planning, there will be a huge amount of unnecessary refactoring that could have been eliminated early on using a small amount of forethought. Though requirements gathering seldom produces perfect requirements, doing no planning at all will guarantee that a project will have major, unexpected requirements: it arrive very late and with a lot of wasted work.

If testing is moved to the beginning of the development cycle, something else must fall back to the end of the cycle. In TDD, that's refactoring. Refactoring becomes expendable—the work most likely to be neglected when a deadline looms.

Some languages, like Ada, even have the capability to verify a proposed design, a capability that is almost useless if no pre-planning is done.

So when TDD advocates claim their process eliminates the need for up-front design, it's really not true. As Fred Brookes said, "Great design does not come from great processes; it comes from great designers". (Master Planner: Fred Brooks Shows How to Design Anything, Wired July 2010)

 Claim 3: TDD Means Better Testing and Better Software

As I mentioned in the introduction, TDD does not mean unit testing. TDD is a strategy for applying unit tests as a project is being built. Ideally, the unit tests built with and without TDD should be the same.

"It's quite easy to get caught up in the technique of TDD and not pay attention to the way unit tests are written." ("The Art of Unit Testing", Roy Osherove, pg. 18)

As TDD advocates want to avoid waste, there is a pressure to minimize testing. When TDD developers are afraid of over-testing, it's not always clear what the criteria for adequate unit testing is. Mr. Glass gives an example of how tests can be prioritized: you can test for meeting requirements, test the structure or integration of components, test the quality of execution and how well the application stays running, or you can focus on testing the worst risks to a project. A unit test's importance can depend on many different factors.

If unit tests are based on requirements, Mr. Glass pointed out that good requirements are seldom available at the start of a project. Requirements can also be ambiguous or have unforeseen gaps. For example, when parsing XML, how should unexpected tags be handled? Should they be silently ignored? Or flagged as exceptions? How do you know if you're testing too little or testing too much? If you're verifying exceptions, is it enough that an exception is thrown, the right exception is thrown, or the right message is included with the exception? Should warnings be tested or ignored as optional?

There are categories of errors that unit tests cannot detect. Problems like numeric overflows, memory leaks, wrong units of measurement, rounding errors, stack overflows, buffer overruns are only caught if a developer explicitly tests for them. Nor do unit tests validate the application as a whole. Code coverage tools will not guarantee that the unit tests will be the best unit tests. The tests are chosen by the developer. The tests only cover the program's functionality, and the testing may be lacking if the functionality is lacking. A test can even be partly right and partly wrong because it doesn't fully reflect the source code being tested. A code review will sometimes catch more bugs that a full set of unit tests. Some kinds of errors cannot be caught with automated testing at all.

Young developer Erik Snoeijs argues that doing tests defines your input and output prior to writing functionality. But the opposite is also true: writing functionality determines the inputs and outputs for the tests. So I don't find that argument compelling for better quality. ("Why I think test driven development suck" [sic]), Erik Snoeijs).

Some TDD advocates say that if a program passes the unit tests, it's ready to be put live into production. Unit tests are a useful tool for testing software but they cannot guarantee that the software is good, nor does it replace other forms of testing. It is possible to write good unit tests without TDD. So TDD doesn't produce better quality software.

 Claim 4: TDD Makes All Languages Equally Good

I once worked with a programmer who claimed that choosing a good computer language for a project didn't matter anymore because of unit tests. Languages were all the same these days and if you throw enough tests a solution, you can be sure it works. So what difference does your choice of language make?

This claim is more about unit testing than TDD. A good language with strong features that promotes good programming practices can eliminate the need to do a lot of testing and can given you more confidence in your project. In "The Business Shell in an Age of Hype", I mentioned the Stephen F. Zeigler study which compared a large, identical project developed in two different programming languages: one language delivered the project in half the time and with many times fewer bugs in the final product. SO your choice of language really does affect delivery.

A good process does not eliminate the need for a good language, and a good language can save a lot of development time.

 Claim 5: TDD Works Effectively on Large Projects

An implicit claim is that TDD works well for projects of any size, whether one person creating a small web site or a million line application with a team of 70 programmers.

First, as I already mentioned, TDD is often accompanied by a lack of up-front planning. Without planning, responsibility cannot be partitioned across a large number of people. This increases the dependencies between people and creates more priority conflicts and refactoring disputes.

I also mentioned that leaving design considerations until the software must be refactored makes more work. Design changes are made most cheaply when they are undertaken as early in a project as possible (Glass). For a large project with high cost and complexity, TDD's approach of having no up-front design strategy may create large costs and delays.

Focusing too much on unit test coverage for bug removal can create design problems. As I pointed out earlier, TDD delays refactoring until late in the development cycle, making it an easy target to ignore as a "nice to have".

Good documentation and training are more important as a project gets big.

Besides these concerns, TDD requires programmers to remain idle while unit tests are run. All unit tests must be run since there's no guarantee—even with object mocking—that a change will not break unit tests for a different part of the program. Rod Coffin writes "The typical Red/Green/Refactor TDD cycle lasts around 5-10 minutes, and the developer usually manually runs the test that is driving the cycle between coding and refactoring iterations. This means that although the feedback loop is very quick, unintended side effects could break other tests and the developer might not receive this feedback until the entire test suite is run." ("Raising the Bar with Continuous Testing") On a large project, running a set of unit tests can take a lot of time (possibly hours) during which the programmer cannot proceed.

An extensive unit test suite is often larger than the program itself. Jacob Proffitt points out that—as a rule of thumb—20% of tests give 80% of the value (TDD or POUT (Plain Old Unit Testing), Jacob Proffitt, The Run Time). Depending on the time and money available, it may simply not be practical to use extensive unit testing on a large project when most of the tests have little return for the investment. When a peer review can take less time than writing extensive unit tests and can catch more bugs, not all errors are equal. Tom Demarco in "Slack" observes that perfect software is often a waste of time: users expect bugs and accept them provided there are reasonable work-arounds. So, depending on your application's needs, the business priorities and bug removal techniques available, extensive unit tests may be wasteful.

TDD may become impractical when the test suite becomes large. TDD assumes the programmer and the test writer are the same person. Having a large test suite may become so burdensome that dedicated developers may have to be hired to write and manage the unit tests. The developers are no longer writing the tests. This breaks the spirit of TDD, where the developers writes the tests and the functionality in a tightly integrated way. When maintaining a large unit test suite (larger than the program itself), the biggest refactoring cost may be updating the tests themselves. This is especially frustrating if 80% of these tests have little value.

Depending on the project, a fanatically tested program delivered too late may not have the same value as a moderately tested program delivered on-time.

Extensive unit testing can work against language features for scaling-up applications. Developers may, for example, make everything "public" in their objects to make testing easier, or forcing all functions to return a value that can be tested (TDD, Wikipeidia). Although unit tests do not explicitly endorse such features, "just get the tests to pass" is a common motto for cutting corners. Not using the programming language effectively, this makes large projects riskier and more costly to develop and maintain as they grow larger.

There's several aspects of strict TDD that may work against the development in a large project.

 Claim 6: TDD Reduces Initial Development Time

Many TDD advocates believe that developers can get more work done using TDD.

Getting more work done is the driver behind many new technologies and techniques. The belief that TDD speeds development is usually justified by the lack of documentation, training and up-front design. I've already shown that documentation and training are necessary parts of any project, and that skipping up-front design can make a lot more work as a project deadline looms. So TDD can actually slow development.

Another justification comes from the abundance of unit tests: bugs can be quickly caught so programmers can spend less time testing their software.

Remember: unit tests are not the same thing as TDD. To answer this claim, consider that TDD requires unit tests and a program, but enforces writing tests up front. If TDD and non-TDD approaches both end up with the same tests and program in the end, will the TDD approach...writing tests up front...produce a program faster than writing the tests concurrently or immediately after each bit of functionality is added? It is unlikely.

A program and its tests compose one integrated solution. Since a test cannot run without the corresponding functionality, and the functionality can't be validated without the corresponding unit test, it seems unlikely that doing tests up front will make development time significantly faster (or worse, for that matter). The same solution must be written at some point.

As I mentioned before, there's aspects of TDD that may not work well with large problems, such as writing unit tests that have low debugging value, the time it takes to run unit tests, etc. These will slow a developer's work.

InfoWorld Editor Andrew Binstock raises a different concern: TDD is not a natural way to think. Breaking up a program by unit tests can be disruptive to solving the larger problem (cited below).

There's a lot of evidence to suggest TDD will slow development, not make it faster.

 So is TDD any good?

I've spent a couple of months investigating TDD. There are several claims about TDD that are questionable. Despite the defenses of people like Gary Bernhardt (The Limits of TDD), a little examination shows that these claims are dubious. A team looking to use TDD should weigh these claims carefully.

Behind the hype, are there things that TDD handles well? Let's take a look at some cases that may not be as impressive-sounding as the hype but may deliver true value to a business.

 Good Use 1: TDD is good for Java

The Java community has a lot of interest in TDD.

Mr. Binstock argues that TDD works well for teaching Java due to Java's hard-to-read error messages and illegible stack traces. These can be daunting to people learning Java. By forcing unit tests to be done for tutorials, a learner can get more useful error messages back as they type up example programs. However, this raises the question of what happens when a student's unit tests—often larger than the program itself—throw stack traces (Learning Java Via TDD: An impressive approach). So I don't find this argument compelling.

More significant is Java's weak features. More powerful languages have features for designing large applications. TDD is used as a workaround: the tests provide a crude specification of what a project where the language has no specification or high-level organizing features.

An additional problem with Java is the practice of installing third-party classes. With the propensity of many Java developers to grab plug-in classes off the Internet, TDD may help to ease difficulties in redesigning or switching classes, which can hide implementation surprises across similar classes, while quickly confirming the application still functions in the expected way before testing all the features manually.

This is an imperfect solution. If your project depends on unit tests to take the place of specification features, you need a long-term solution that provides these features. If you're stuck with Java, TDD provides a bandage to slow the bleeding.

 Good Use 2: TDD may help with Continuous Integration

Some companies want their software to be able to be released in a moment's notice. Continuous integration (CI) refers to automated building of the software, such as after submitting changes or on a fixed schedule. This ensures all parts compile together or that the overall functionality works. (Whether CI meets all its claims, and what the hidden costs are, is a subject for another blog.)

Rob Harwood argues that you can't do CI without TDD ("Faster Feedback and Why You Want It"). However, TDD focuses on unit testing and doesn't address when or how integration tests (or other forms of error removal) should be handled.

One advantage of TDD unit tests is that they act as a safety net if incomplete code has been checked into the source repository. This is particularly a problem with source control software like SVN that has only one shared repo. Using TDD, the software is always ready to be regression tested the moment an immediate deployment is announced. The unit tests can be run prior to the integration process. In the best case, tests will fail for incomplete features that are accidentally checked into the code base (since tests are written first). Those features can be quickly identified and disabled.

If rapid, tested releases are mandatory, then these trade-offs may be worth it for a business. I used the word "may" because this is a rare case. Using source control software that supports local saves (e.g. Git) reduces the risk of committing incomplete work. And the drawbacks of TDD (e.g. being a liability for large projects) may outweigh the advantages in some CI scenarios.

 Good Use 3: For Variety

As strange as it sounds, in most projects TDD has little positive or negative impact. Developers may want to write their tests first just to "change things up a little".

Mr. Binstock argues that TDD works in the opposite way that people think: people enjoy building solutions not building problems and TDD is about making many trivial, uninteresting problems and solving them without foresight. TDD makes development mechanical, unchallenging and uninteresting. So once the novelty wears off, a team may want to switch to another development method.

 Good Use 4: TDD is good where Testing is Neglected

The greatest value from TDD is one that I haven't read in any blogs: improving quality by pushing back against business schedules.

In a company with realistic schedules, unit testing under TDD is much the same as unit testing without TDD. As Andrew Dalke writes in his blog, "good testing practices without TDD would have given the same [positive] results." (Problems with TDD, Andrew Dalke).

Aggressive schedules (that is, unrealistic, fantasy schedules) can be enforced by either panic-stricken management or over-confident developers. Error removal often takes 30% to 40% of development time (Glass). When debugging is treated as a "nice to have", not a requirement, error removal gets reduced or abandoned altogether. Releasing shoddy work is a threat to the developer's career.

By demanding unit testing up-front, some testing gets done before the functionality is written. Since the project cannot be released without the functionality, management will have no choice but to delay the project until the functionality is implemented...and undergone unit testing even if peer reviews, QA testing and other forms of bug removal are neglected. So TDD is a defense for a developer to protect his/her reputation.

 Conclusion

When astronomers talk about their high levels of certainty of finding planets, they are referring to how certain they are that their machines work, or the authenticity of their techniques, not how certain a planet is really there. Since we don't know how many solar systems there are out there, we can't know the effectiveness of the search for them. Nevertheless, people get excited about the claims made by astronomers and take them out of context.

In software development, people are desperate for a miracle, a new technique or technology that will make programming more effective. As Robert L. Glass has pointed out, most "breakthroughs" are really hype—unrealistic and unsubstantiated claims. (The Business Shell in an Age of Hype, The Lone Coder).

The truth is, TDD is an effective way to fight aggressive schedules. If you're using Java, or are using CI, or are stuck in rut and want to try a different process, it might have some value. But don't expect "testing before coding" to produce miracles: in many cases, TDD will have little benefit.

TDD has its downside. The process tends to work against good testing and planning. It may be a liability to large-scale, complex projects. And, in the long run, may simply bore developers and encourage them to move on to their next job.

Most TDD advocates stop short of TDD curing cancer or raising the dead. Mr. Harwood brags that TDD is the "next level beyond basic unit testing". Others claim it eliminates documentation, training, requirements gathering, improves quality or accelerates development. With any unfamilliar technology or technique, evaluate its strengths and weaknesses and form an opinion. But be aware of outrageous claims and examine them carefully.

Now if only there was "comment driven development" to ensure work is properly commented before starting to write code.

December 15, 2010 

[Cafe] Comment [Link Opens New Window]

Talk back on the Linux Cafe

[RSS] Subscribe

Works with Firefox, Thunderbird or RSS viewers

Digg! Gotta Digg The Lone Coder /
Share at SlashDot [Link Opens New Window]

Recommend this Article

^ Back to the Top

Read More (Agile):  What is an Agile Language? --> 

Read More (by date):  My Daily WTF: The Server Outage --> 

Read More:  The Lone Coder Home Page -->