The Lone Coder Reflections for the Unsung Linux Saviours
by Ken O. Burtch
Why and When To Use Test-Driven Development Effectively
"Barfield never made me an Anthroposophist, but his counterattacks destroyed forever two elements in my own thought. In the first place he made short work of what I have called my "chronological snobbery," the uncritical acceptance of the intellectual climate common to our own age and the assumption that whatever has gone out of date is on that account discredited. You must find why it went out of date. Was it ever refuted (and if so by whom, where, and how conclusively) or did it merely die away as fashions do? If the latter, this tells us nothing about its truth or falsehood. From seeing this, one passes to the realization that our own age is also "a period," and certainly has, like all periods, its own characteristic illusions. They are likeliest to lurk in those widespread assumptions which are so ingrained in the age that no one dares to attack or feels it necessary to defend them."
-- C. S. Lewis, Surprised by Joy (Quoted from Wikipedia)
Recently astronomers have claimed
more than 450 stars have planets around them. Yet a few years ago, scientists
said that we lacked the technology to find planets around other stars. What
happened?
As I talked about in
The Open Source Guide to the Solar System (Lone Coder August 2006), what exactly a planet is and the science
to locate one is rather tenuous. In the local neighbourhood, around 85% of
stars are small, flare irregularly are have multiple stars. These make
planets unlikely, and certainly planets like the Earth we know impossible.
The Sun, the Earth and our solar system are true stellar rarities
(Nearby Stars,
Wikipedia).
Most of the research has focused on trying to find hypothetical
giant planets, including "brown
dwarfs" (that is, failed stars) and "hot Jupiters" (huge planets extremely
close to their stars). But the evidence for these 450 planetary systems is
sketchy and further research have begun rejecting some them. Some close by stars
wobble but no planet has been viewed, meaning the
wobbling could have a different cause
(GJ_412, Wikipedia).
Other claims of planets in the nearest systems have been refuted or questioned--for
example,
Bernard's Star and
Lalande 21185. In one
case the "planet" was an object in the background. And some of the
evidence of the defunct Spitzer satellite is dubious—such as planets with
hot sides that don't face their stars
(Jet Propulsion Lab, Oct 19, 2010). Recent computers models
suggest that "hot Jupiters" cannot actually exist: the forces of the star
would tear a close giant planetary partner into pieces
(NASA article).
Extraordinary claims require extraordinary proof. In the
case of the search for planets outside our solar system, the evidence is
early, sketchy and needs collaboration .
This month I'm investigating Test-Driven Development (TDD).
What I thought was a simple software practice...essentially "test then
code"...turned out to be the longest, most complicated Lone Coder article of
the year. I'm going to do my best to describe here what TDD is, what it does,
what it doesn't do, and what value it brings to a software project. This is
not an easy task because there are many claims made about it: it makes better
software, it saves time or even it's fun. Is their solid evidence that TDD
is a breakthrough?
TDD: What is it?
Test-Driven Development (or sometimes Test-Driven Design) is a software
process that became popular around 2003. Unit tests are low-level, code-level
"pinhole" tests that focus on the validating how an isolated piece of a program. TDD
followers write a single unit test before any
programming is done. Then they write only enough of the program to pass that
one test. They continue in this way, writing a simple test and
doing enough work to pass that test. Because there's no up-front design,
a TDD practitioner assumes he will have to perform major rewrites as needed,
but never proceeding to new functionality until all the existing tests pass.
In a sense, TDD is a process of continual low-level regression testing. A
regression test suite is a set of automated tests that you run to make sure
the application still functions correctly after a change (to verify its previous
capabilities didn't "regress", or break, after the change). Unlike regression tests,
TDD stipulates that new tests must be created before writing new programming,
and the test suite must be run after every change, no matter how
insignificant.
Some people, like Eric Shupps, believe that TDD is about good unit testing
(SPTDD: SharePoint and Test Driven Development, Eric Shupps, BinaryWave).
TDD is not unit testing: it a process that stipulates where and when
these tests should written and
run. Unit tests can be used without the TDD approach. When TDD is used, a test
is the first step in developing new functionality. If you write a unit test in
parallel with new functionality, or otherwise after starting new functionality,
or creating more than one test up front, then you're not following the
TDD approach. You can write good tests without using TDD. This is why TDD
and good testing are not the same thing.
Because TDD links test writing and functionality writing, code coverage
tools are commonly used to verify the test coverage. You don't
need TDD to use coverage tools, but if you practice TDD, they are an implicit
requirement.
Let's take a look at some of the other claims about TDD.
Claim 1: TDD Eliminates All Documentation and Training
Some TDD advocates believe that TDD provides ample unit tests
and the unit tests document what the project is supposed to do. So
documentation is unnecessary.
Let's face it: many developers aren't very good at
writing documentation. They find writing in simple English sentences
harder than writing source code. Documentation isn't fun. This is one reason
documentation is discarded during development.
Another reason is a concern over deadlines. Agile
process tries to avoid unnecessary documentation, but some developers view all
documentation as unnecessary waste and try to do none at all. The software
will run regardless if documentation exists or not and they don't
see the bigger issues of team communication and training as important.
Good, useful documentation expresses the "why's" of a project, or the
impact of the project in the real world. Such docs can include the
goals of a project, tutorials, design rationals (why a particular
solution was chosen), non-technical overviews and so on. Unit tests
are programs, and like any source code, do not fill the need for these kinds of
useful documentation (Agile people still don't get it, Cedric Beust).
Along the same lines, some TDD advocates say that training is no longer
necessary: just look at the (thousands of) unit tests to learn how the
software works. This kind of argument is used without TDD or even without
unit tests at all. "Just read the code." The audience of unit tests
is the source code, not people living in the real world dealing with issues
not related to the programming.
Unit tests seem ill-equipped to replace documentation and training.
Claim 2: TDD Eliminates All Up-Front Design
If design is about meeting requirements, then creating unit tests up-front
is like a yes or no question: does the program meet this requirement.
As you create a test and make the program pass it, the theory is that your
tests dictate the design of the project.
In "Facts and Fallacies of Software Engineering", veteran
researcher Robert L. Glass quotes studies that show that most project delays come
from insufficient requirements gathering and changes in requirements as
a project progresses. It's not surprising that some developers believe
that if perfect requirements are difficult to obtain, then planning,
like documentation, is a waste of time. They believe that you should
just get on with the business of writing code and refactor when you hit
impassible roadblocks—that's what you're going to have to do anyway.
However, Mr. Glass also explains that the requirement problems are best
handled as early in the project as possible. The later you redesign,
the more work it will require.
By the famous 80/20 rule,
80% of the delays for a project are caused by 20% of the problems. By
eliminating all planning, there will be a huge amount of unnecessary
refactoring that could have been eliminated early on using a small
amount of forethought. Though requirements gathering seldom produces
perfect requirements, doing no planning at all will guarantee that
a project will have major, unexpected requirements: it arrive very late
and with a lot of wasted work.
If testing is moved to the beginning of the development
cycle, something else must fall back to the end of the cycle. In TDD,
that's refactoring. Refactoring becomes expendable—the work most
likely to be neglected when a deadline looms.
Some languages, like Ada, even have the capability to verify a proposed
design, a capability that is almost useless if no pre-planning is done.
So when TDD advocates claim their process eliminates the need for up-front
design, it's really not true. As Fred Brookes said, "Great design does not come from great processes; it comes from great designers".
(Master Planner: Fred Brooks Shows How to Design Anything, Wired July 2010)
Claim 3: TDD Means Better Testing and Better Software
As I mentioned in the introduction, TDD does not mean unit testing.
TDD is a strategy for applying unit tests as a project is being
built. Ideally, the unit tests built with and without TDD should be the same.
"It's quite easy to get caught up in the technique of TDD and not pay
attention to the way unit tests are written." ("The Art of Unit Testing",
Roy Osherove, pg. 18)
As TDD advocates want to avoid waste, there is a pressure to minimize
testing.
When TDD developers are afraid of over-testing, it's not always clear
what the criteria for adequate unit testing is. Mr. Glass gives an
example of how tests can be prioritized: you can test for meeting
requirements, test the structure or integration of components, test
the quality of execution and how well the application stays running,
or you can focus on testing the worst risks to a project. A unit test's
importance can depend on many different factors.
If unit tests are based on requirements, Mr. Glass pointed out that good
requirements are seldom available at the start of a project.
Requirements can also be ambiguous or have unforeseen gaps. For
example, when parsing XML, how should unexpected tags be handled? Should
they be silently ignored? Or flagged as exceptions? How do you know if
you're testing too little or testing too much? If you're verifying
exceptions, is it enough that an exception is thrown, the right exception
is thrown, or the right message is included with the exception? Should
warnings be tested or ignored as optional?
There are categories of errors that unit tests cannot
detect. Problems like numeric overflows, memory leaks, wrong units of
measurement,
rounding errors, stack overflows, buffer overruns are only caught if
a developer explicitly tests for them. Nor do unit tests validate the
application as a whole.
Code coverage tools will not guarantee that the
unit tests will be the best unit tests. The tests are chosen by the
developer. The tests only cover the program's functionality, and the
testing may be lacking if the functionality is lacking. A test can
even be partly right and partly wrong because it doesn't fully
reflect the source code being tested.
A code review will sometimes catch more bugs that a full set of unit
tests. Some kinds of errors cannot be caught with automated testing at
all.
Young developer Erik Snoeijs argues that doing tests defines your
input and output prior to writing functionality. But the opposite
is also true: writing functionality determines the inputs and
outputs for the tests. So I don't find that argument compelling
for better quality.
("Why I think test driven development suck" [sic]), Erik Snoeijs).
Some TDD advocates say that if a program passes the
unit tests, it's ready to be put live into production.
Unit tests are a useful tool for testing software but they cannot
guarantee that the software is good, nor does it replace other forms
of testing. It is possible to write good unit tests without TDD.
So TDD doesn't produce better quality software.
Claim 4: TDD Makes All Languages Equally Good
I once worked with a programmer who claimed that choosing a good computer
language for a project didn't matter anymore because of unit tests.
Languages were all the same these days and if you throw enough tests
a solution, you can be sure it works. So what difference does your choice
of language make?
This claim is more about unit testing than TDD. A good language with
strong features that promotes good programming practices can eliminate
the need to do a lot of testing and can given you more confidence in
your project. In "The Business Shell in an Age of Hype", I mentioned
the Stephen F. Zeigler study which compared a large, identical project
developed in two different programming languages: one language delivered
the project in half the time and with many times fewer bugs in the final
product. SO your choice of language really does affect delivery.
A good process does not eliminate the need for a good language, and
a good language can save a lot of development time.
Claim 5: TDD Works Effectively on Large Projects
An implicit claim is that TDD works well for projects of any size,
whether one person creating a small web site or a million line
application with a team of 70 programmers.
First, as I already mentioned, TDD is often accompanied
by a lack of up-front planning. Without planning,
responsibility cannot be partitioned across a large number of
people. This increases the dependencies between people and creates
more priority conflicts and refactoring disputes.
I also mentioned that leaving design considerations until the software must be refactored
makes more work. Design changes are made most cheaply when they are
undertaken as early in a project as possible (Glass). For a large
project with high cost and complexity, TDD's approach of having no
up-front design strategy may create large costs and delays.
Focusing too much on unit test coverage for bug removal can create
design problems. As I pointed out earlier, TDD delays refactoring
until late in the development cycle, making it an easy target to
ignore as a "nice to have".
Good documentation and training are more important as a
project gets big.
Besides these concerns, TDD requires programmers to remain idle while unit tests are run.
All unit tests must be run since there's no guarantee—even with object mocking—that a change will
not break unit tests for a different part of the program. Rod Coffin writes
"The typical Red/Green/Refactor TDD cycle lasts around 5-10 minutes, and
the developer usually manually runs the test that is driving the cycle
between coding and refactoring iterations. This means that although the
feedback loop is very quick, unintended side effects could break other
tests and the developer might not receive this feedback until the entire
test suite is run."
("Raising the Bar with Continuous Testing")
On a large project, running a set of unit tests can take a lot of
time (possibly hours) during which the programmer cannot proceed.
An extensive unit test suite is often larger than the program itself.
Jacob Proffitt points out that—as a rule of thumb—20% of tests give 80%
of the value (TDD or POUT (Plain Old Unit Testing), Jacob Proffitt, The Run Time).
Depending on the time and money available, it may simply not be practical
to use extensive unit testing on a large project when most of the tests
have little return for the investment.
When a peer review can take less time than writing extensive unit tests and
can catch more bugs, not all errors are equal. Tom Demarco in "Slack"
observes that perfect software is often a waste of time: users expect
bugs and accept them provided there are reasonable work-arounds. So,
depending on your application's needs, the business priorities and
bug removal techniques available, extensive unit tests may be wasteful.
TDD may become impractical when the test suite becomes large.
TDD assumes the programmer and the test writer are the same person.
Having a large test suite may become so
burdensome that dedicated developers may have to be hired to write and
manage the unit tests. The developers are no longer writing the tests.
This breaks the spirit of TDD, where the developers writes the tests and
the functionality in a tightly integrated way. When maintaining a large
unit test suite (larger than the program itself), the biggest refactoring
cost may be updating the tests themselves. This is especially frustrating
if 80% of these tests have little value.
Depending on the project, a fanatically tested program delivered
too late may not have the same value as a moderately tested program
delivered on-time.
Extensive unit testing can work against language features
for scaling-up applications.
Developers may, for example, make everything "public" in their objects
to make testing easier, or forcing all functions to return a value that can be
tested (TDD, Wikipeidia). Although
unit tests do not explicitly endorse such features, "just get the tests to pass"
is a common motto for cutting corners. Not using the programming language
effectively, this makes large projects riskier
and more costly to develop and maintain as they grow larger.
There's several aspects of strict TDD that may work against the
development in a large project.
Claim 6: TDD Reduces Initial Development Time
Many TDD advocates believe that developers can get more work
done using TDD.
Getting more work done is the driver behind many new
technologies and techniques. The belief that TDD speeds development is usually
justified by the lack of documentation, training and up-front design. I've
already shown that documentation and training are necessary parts of any
project, and that skipping up-front design can make a lot more work as a
project deadline looms. So TDD can actually slow development.
Another justification comes from the abundance of unit tests:
bugs can be quickly caught so programmers can spend less time testing their
software.
Remember: unit tests are not the same thing as TDD. To answer
this claim, consider that TDD requires unit tests and a program, but enforces
writing tests up front. If TDD and non-TDD approaches both end up
with the same tests and program in the end, will the TDD approach...writing
tests up front...produce a program faster than writing the tests
concurrently or immediately after each bit of functionality is added? It
is unlikely.
A program and its tests
compose one integrated solution. Since a test cannot run without the
corresponding functionality, and the functionality can't be validated
without the corresponding unit test, it seems unlikely that doing tests
up front will make development time significantly faster (or worse, for
that matter). The same solution must be written at some point.
As I mentioned before, there's aspects of TDD that may
not work well with large problems, such as writing unit tests that have
low debugging value, the time it takes to run unit tests, etc. These will
slow a developer's work.
InfoWorld Editor Andrew Binstock raises a different concern:
TDD is not a natural way to think. Breaking up a program by unit tests
can be disruptive to solving the larger problem (cited below).
There's a lot of evidence to suggest TDD will slow development,
not make it faster.
So is TDD any good?
I've spent a couple of months investigating TDD. There are
several claims about TDD that are questionable. Despite the defenses of
people like Gary Bernhardt
(The Limits of TDD),
a little examination shows that these claims are dubious. A team looking to
use TDD should weigh these claims carefully.
Behind the hype, are there things that TDD handles well?
Let's take a look at some cases that may not be as impressive-sounding
as the hype but may deliver true value to a business.
Good Use 1: TDD is good for Java
The Java community has a lot of interest in TDD.
Mr. Binstock argues that TDD works well for teaching Java
due to Java's
hard-to-read error messages and illegible stack traces. These can
be daunting to people learning Java. By forcing unit
tests to be done for tutorials, a learner can get more useful error
messages back as they type up example programs. However, this raises
the question of what happens when a student's unit tests—often larger
than the program itself—throw stack traces
(Learning Java Via TDD: An impressive approach). So I don't find this argument
compelling.
More significant is Java's weak features. More
powerful languages have features for designing large applications. TDD
is used as a workaround: the tests provide a crude specification of what
a project where the language has no specification or high-level organizing
features.
An additional problem with Java is the practice of installing
third-party classes. With the propensity of many Java developers to grab
plug-in classes off the Internet, TDD may help to ease difficulties in
redesigning or switching classes, which can hide implementation surprises
across similar classes, while quickly confirming the application still
functions in the expected way before testing all the features manually.
This is an imperfect solution. If your project depends on unit
tests to take the place of specification features, you need a long-term solution
that provides these features. If you're stuck with Java, TDD provides
a bandage to slow the bleeding.
Good Use 2: TDD may help with Continuous Integration
Some companies want their software to be able to be released in a moment's
notice. Continuous integration (CI) refers to automated building of the
software, such as after submitting changes or on a fixed schedule. This
ensures all parts compile together or that the overall functionality works.
(Whether CI meets all its claims, and what the hidden costs are, is a subject
for another blog.)
Rob Harwood argues that you can't do CI without TDD
("Faster Feedback and Why You Want It").
However, TDD focuses on unit testing and doesn't address when or how
integration tests (or other forms of error removal) should be handled.
One advantage of TDD unit tests is that they
act as a safety net if incomplete code has been checked into the source
repository. This is particularly a problem with source control software
like SVN that has only one shared repo. Using TDD, the software is
always ready to be regression tested the moment an immediate deployment
is announced. The unit tests can be run prior to the integration process.
In the best case, tests will fail for incomplete features that are
accidentally checked into the code base (since tests are written first).
Those features can be quickly identified and disabled.
If rapid, tested releases
are mandatory, then these trade-offs may be worth it for a business.
I used the word "may" because this is a rare case.
Using source control software that supports local saves
(e.g. Git) reduces the risk of committing incomplete work.
And the drawbacks of TDD (e.g. being a liability for large projects)
may outweigh the advantages in some CI scenarios.
Good Use 3: For Variety
As strange as it sounds, in most projects TDD has little positive or negative
impact. Developers may want to write their tests first just to "change things
up a little".
Mr. Binstock argues that TDD works in the opposite
way that people think: people enjoy building solutions not building problems
and TDD is about making many trivial, uninteresting problems and solving them
without foresight. TDD makes development mechanical, unchallenging and
uninteresting. So once the novelty wears off, a team may want to switch to
another development method.
Good Use 4: TDD is good where Testing is Neglected
The greatest value from TDD is one that I haven't read in any
blogs: improving quality by pushing back against business schedules.
In a company with realistic schedules, unit testing under TDD
is much the same as unit testing without TDD. As Andrew Dalke writes in his
blog, "good testing practices without TDD would have given the same [positive]
results."
(Problems with TDD, Andrew Dalke).
Aggressive schedules (that is, unrealistic, fantasy
schedules) can be enforced by either panic-stricken management or
over-confident developers. Error removal often takes 30% to 40% of
development time (Glass). When debugging is treated as a "nice to have", not a
requirement, error removal gets reduced or abandoned altogether. Releasing
shoddy work is a threat to the developer's career.
By demanding unit testing up-front, some testing gets done
before the functionality is written. Since the project cannot be released
without the functionality, management will have no choice but to delay the
project until the functionality is implemented...and undergone unit testing
even if peer reviews, QA testing and other forms of bug removal are neglected.
So TDD is a defense for a developer to protect his/her reputation.
Conclusion
When astronomers talk about their high levels of certainty
of finding planets, they are referring to how certain they are that their
machines work, or the authenticity of their techniques, not how certain a
planet is really there. Since we
don't know how many solar systems there are out there, we can't know the
effectiveness of the search for them. Nevertheless, people get excited
about the claims made by astronomers and take them out of context.
In software development, people are desperate for a miracle, a new technique
or technology that will make programming more effective. As Robert L. Glass
has pointed out, most "breakthroughs" are really hype—unrealistic and
unsubstantiated claims.
(The Business Shell
in an Age of Hype, The Lone Coder).
The truth is, TDD is an effective way to fight aggressive
schedules. If you're using Java, or are using CI, or are stuck in rut and
want to try a different process, it might have some value. But don't
expect "testing before coding" to produce miracles: in many cases, TDD
will have little benefit.
TDD has its downside. The process tends to work against good testing
and planning. It may be a liability to large-scale, complex projects. And,
in the long run, may simply bore developers and encourage them to move on
to their next job.
Most TDD
advocates stop short of TDD curing cancer or raising the dead. Mr.
Harwood brags that TDD is the "next level beyond basic unit testing".
Others claim it eliminates documentation, training, requirements gathering,
improves quality or accelerates development.
With any unfamilliar technology or technique, evaluate its
strengths and weaknesses and form an opinion. But be aware of outrageous
claims and examine them carefully.
Now if only there was "comment driven development" to ensure work is properly
commented before starting to write code.