The Lone Coder Reflections for the Unsung Linux Saviours
by Ken O. Burtch
Unit Tests : An Pound of Prevention?
In the first place, as an ounce of prevention is worth a pound of cure, I
would advise 'em to take care how they suffer living coals in a full
shovel, to be carried out of one room into another, or up or down stairs,
unless in a warmingpan shut; for scraps of fire may fall into chinks and
make no appearance until midnight; when your stairs being in flames, you
may be forced, (as I once was) to leap out of your windows, and hazard
your necks to avoid being oven-roasted.
-- Benjamin Franklin, discussing fire safety, February 1735
A few people laughed at my comments on safe driving
and sharing the road in my article
"Dark
Architecture" (Lone Coder November 2009). Fog lights at night are
dangerous. The situation is growing worse in Toronto with the popularity
of illegal headlights. These lights are three or more times
more powerful than allowed by law, bright enough to cause physical pain in
the eyes of other drivers. Supporters of the illegal lights like the
benefits of driving at night with day-like vision and being the center of
attention. How do you evaluate the risks and the benefits, especially
when the risks are shouldered by the people around you, not yourself?
A recent
CAA study reports that Canadians are increasingly
indifferent to the drivers around them, with more cases of using electronic
devices, tailgating and refusing to signal when turning
(More drivers leaving courtesy at the curb, CTV). How risky is this? "It's a matter of life or death for
hundreds of Canadians every year," a CAA spokesperson said. A survey by the
U.S. NHTSA showed that most people don't believe the speeding is dangerous:
that they can their handle excessive speed even if the other drivers on the road
can't
(Drivers' Beliefs about Unsafe Driving, NHTSA).
There can be a disconnect between cause and effect.
According to
the U.S. Center for Disease Control, driving is the leading cause of accidental
death and the most dangerous task performed by average people. Two thirds of
drivers will be involved in some kind of accident during their life. The
trend towards distracted, discourteous and aggressive driving increases that
risk of serious injury or death.
Robert X. Cringley related his own bad driver horror story this week
("Our Own Worst Enemies", Cringley.com)
I spoke with a trauma nurse for an Ontario hospital. "85% or
90% of our cases are from car accidents," she said.
When a head crashes through the windshield, it
inevitably causes brain swelling and permanent brain damage. Once smart and
successful people are left with altered personalities, memory loss and unable
to simple task. "Every day you have to put a brush in their hand and explain
'this is how you brush your hair'. By tomorrow, they will have forgotten and
you start over again. It's the worst part of my job." Brain injuries are
largely preventable by wearing a seat belt and driving safely within the law.
Unit tests aren't much different. How to you assess their risks and benefits, especially when the risks may not be to you? What is the right
amount of unit testing?
Unit Tests Everywhere
I last discussed unit tests in
Why and When To Use Test-Driven Development Effectively.
Unit tests are low-level, code-level "pinhole" tests that focus on the
validating an isolated piece of a program. This is sometimes called
"white box testing" because you don't treat the program as an unknowable
"black box". Besides TDD, they can be useful in cases such as:
Error-prone Programming Languages. Some programming languages are
more difficult to debug than others. Unit test provides an additional
tool for bug removal.
Large projects. When a project becomes large, testing the program as
a single application may be too difficult.
High Reliability Applications. With some problem domains (such as
air traffic control, banking or space exploration), there is a very high
price for uncaught bugs. Such domains need to remove as many bugs as
possible.
Continuous Integration: having a set of detailed tests run after every software
update or once a day to check for new bugs. Unit tests are easily
automated and provide quick feedback.
Barriers to Unit Testing
Programmers don't like to write tests. They want
to solve problems and tests don't solve problems, they find problems.
So tests aren't fun to make, and they hurt self-esteem when they work.
Cowboy coders see unit tests as cutting into their
speed which they use as evidence of their worth. This makes unit testing
a hard sell.
Ironically, programmers are also reluctant to change
programming that has a lot of unit tests because they hate working
on updating the tests. So unit tests and continuous integration
can stifle refactoring, and may even entrench bad design
(Selective Unit Testing - Costs and Benefits).
As I pointed out in the article on Test-Driven Development, writing
automated tests is a process of diminishing returns. Writing a lot
of unit tests means you have more confidence in the reliability of
your software. However, with each test that simulates a more rare
and unlikely occurrence, you get less payoff. The cost of maintaining
the test may even be worse than the cost of failing to catch a bug.
For example, consider a test for an out-of-memory
condition in a typical web application. This seems like a serious
error on first glance. A test to force a memory exception can be time consuming
to set up and implement reliably. But these days, if the server can't
provide you memory, the operating system is dangerously short of
resources. Handling this exception may not be worth the trouble when
your entire world is collapsing and your program will likely fail
even if you catch the exception. You may not get a good payoff for
the effort of writing the test.
Customers often don't care much about quality—that is,
they say they want a quality product until they have to pay for it.
Customers accept as many as 3 defects per 100 lines of code. The irony
is that quality software is usually built faster and cheaper: unit tests
can accelerates productivity.
("Peopleware", Tom DeMarco and Timothy Lister, pg.21-23)
Some Common Standards
Here are some principles of unit testing that most people agree
on:
All public or "API" features must be tested. They are the "contract"
that is intended for use by other people and these should be tested to
ensure they function properly.
Good and bad boundary cases should be tested. Edge testing is a
common principle in all forms of software testing. If a number is
expected to be 1 to 10, testing with 1 and 10 (and the bad cases 0
and 11) gives you confidence in the software.
It's not enough to test the "happy path": bad cases must be tested
as well. Testing bad cases usually requires more work than good cases.
Testing too many different things in one test makes the test less
useful. Tests should be focused on particular issues.
Unit tests don't replace other forms of debugging, such as code
inspections. Different error removal techniques catch different kinds
of errors. According to Robert L. Glass, if you want as clean a
program as possible, you need to all the tools available to you.
As a general rule of thumb, 3 to 5 lines in a unit test
are required to test one line of a program, so you should expect for
100% code coverage to have a file 3 to 5 times the size of the file
you are testing. In other words, 1000 lines of unit tests cover about
a quarter of a 1000 line program.
(Unit Testing, Wikipedia). However, keep in mind that even 100% coverage doesn't mean that
your project is perfect: tests may be flawed, outdated, or they may not test non-functional
issues and so forth.
Code coverage tools that show which statements haven't
been tested can be helpful if you want to check that the most serious
paths and most dangerous conditions have been tested.
Testing Non-Public Types
A common issue encountered by people who are new to unit
testing is what to do about encapsulation and access restriction. The testing functions,
as an outsider, normally cannot access the content of private or protected
items. Access restrictions are purely voluntary (a program doesn't need
them at all to function) and programmers use them for multiple reasons as
they see fit. As a result, opinions vary widely on the best way to handle this
situation.
Strategy 1: Don't test private or protected items,
only test what is public.
This approach is to side-step the issue. If non-public
items define the implementation—how work is performed—then they shouldn't
be tested at all. Stick with black box testing. Just be concerned with how the program component appears
to the outside world. In addition, since public items are less likely
to change, this approach means less work maintaining the tests.
This is a
false dichotomy.
While it is true that public items ought to be slow to change, it does
not guarantee that they won't change, nor does not follow that every
non-public item will change too much to be justify testing. In addition,
the implementation consists of all the executable statements, regardless
of their access restrictions.
The main problem is that, for many large classes, most of
the functionality will be behind access restrictions. Public-only tests
may have a hard time creating test-cases to cover the code, and even then,
the tests may not be able to give specific error messages about issues. A
unit test needs not only to test the function but provide meaningful
feedback.
If you are using Test-Driven Development, white box
testing is pretty much a requirement.
In some programming languages, there are privileged methods
that can call another method, ignoring the access restrictions. In Ruby,
the send() method does this.
Another possible technique (in PHP) is using __call().
This "magic function" is executed (if
it is declared in a class) whenever a function in the class cannot be
called. For example, __call is invoked for a function that doesn't
exist or one that cannot be accessed because it is private or protected.
__call can then check a unit testing flag and run the non-public function
if tests are underway
(No Carrier,
Garfield Tech).
This solution won't work for languages that do not have
this capability. The call approach takes some effort to set up: you
need to use get_class_methods() to get a list of classes and to check to
see if your function is in the method list. If you setup __call in a
parent class, you'll have to write a __call in the child that invokes
the parent's __call as PHP lacks virtual functions.
Strategy 3: Use Friend Classes
When functions/classes are declared with "friend" in C++
(or "public" in Ruby), the encompassing class can access private/
protected members of this class
(Friendship and inheritance, CPlusPlus.com).
class SomeClass {
public foo;
private bar;
friendclass SomeClass_UnitTests;
}
This is convenient if your language supports it,
but many languages do not have friend classes. In C++, using a
friend class means your unit tests will be in your compiled program
unless you use preprocessor directives to remove them. These
directives clutter the code and alter the program you are testing.
You run a risk that a directive mistake will hide or cause bugs.
(What is wrong with making a unit test a friend of the class it is testing?, StackOverflow).
Strategy 4: Create a Testing / Accessor Subclass
Use the natural capabilities of object classes. Take
a class and extend it with a subclass containing wrapper functions
providing access to the non-public items. The class being tested is
left unaltered. The test classes only have to be included for unit
testing.
class SomeClass {
public foo;
protected bar;
}
class SomeCLass_UnitTests extends SomeClass {
publicfunction getBar() {
return $this->bar;
}
}
Using a child class means we can access protected items but not
private ones. The wrapper functions are disassociated from the functions
being wrapped which may make them harder to read or track. The wrappers
must be updated when the items they wrap are changed.
Strategy 5. Create Wrapper / Accessor Functions
Instead of using a child class, place wrappers in the
class with the non-public items. The wrappers can be placed immediately
before or after the items they refer to. Both private and protected
items can be accessed.
class SomeClass {
public foo;
private bar;
publicfunction getBar_Test() {
return $this->bar;
}
}
The wrapper functions clutter the code and must be updated
when the items they wrap are changed. The wrappers will be in the released
application and could be exploited unless they are removed with preprocessor
directives. This has the same risks as Strategy 3.
Strategy 6. Redefine Everything to Public when Testing
In a language with a preprocessor (like C++), use the
preprocessor to rewrite "private" and "protected" to "public" before
running tests. This can be controlled by a variable passed to the
language compiler/interpreter. This has minimal impact on the source
code.
#ifdef UNIT_TESTING
#define private public
#define protected public
#endif
This requires a language with a preprocessor since doing
a simple search-and-replace (like the Linux sed) runs the risk of redefining
things that contain the text "private" or "protected". Technically, you
are altering the program you are trying to test (even though you are only
altering the access restrictions).
Strategy 7. Use Reflection to Change Access at Run-time
In a language with class reflection (the ability to read and change
object properties while a program is running), alter the object being
tested and make non-public items public for testing. The reflection
is executed in the unit test so the original source code is unchanged.
Since reflection changes the running state of an object, using reflection
in general may make white box testing easier (such as the ability to use
test doubles). For example, in PHP 5.3:
$refObj = new ReflectionClass( "MyClass" );
$refProp = $refObj->getProperty( "MyPrivateVar" ); // or getMethod()
$refProp->setAccessible( true );
This requires a language with class reflection capability.
The unit tests are larger to write when compared to other solutions.
Reflection can be error-prone and hard to maintain because the parameters
to reflection functions are strings and because it can be
difficult to read and follow. Reflection may disable your language's
compile-time class tests, making it the responsibility of your unit tests
to tests these conditions as well. Technically,
you are altering the program you are trying to test (even though it's only
the access restrictions).
(PHP Manual,
How to use PHP's ReflectionClass to Test Private methods and Properties with PHPUnit",
Is it bad practice to use Reflection in Unit testing?, StackOverflow ).
If you need to use test doubles,
reflection is a requirement.
Strategy 8. Move the Non-Public Items to a New Class
This solution argues that the ability to test your
software should be a part of its design. Non-public items are probably
helpers/utilities. Store them in a separate class instead, keeping only
a private object pointer to that class. "Private items mean there's a
new class struggling to get out."
A program should be testable, but that doesn't mean that
program should be structured around testing.
A program should be broken down around code reuse and readability.
A large class could produce dozens of smaller ones.
Excessive use of classes to solve
every problem has lead to the "class hell" problem in many languages, where
the class tree is bureaucratic and the reasoning incomprehensible. Refactoring
should not be done to make testing easier at the expense of the solution
(Unit Testing Private Variables, Binstock;
Artima;
What is an Agile Language?,Lone Coder).
Strategy 9. Use Public (or Java Package) Access
In this case make everything public. Since access restrictions
are voluntary and do not change the function of items, just don't use them.
Without access restrictions, you lose encapsulation. This
makes your project harder to read and maintain over time.
The best strategy of the nine listed here really depends
on your project, programming language and requirements.
Are Test Doubles Having the Last Laugh?
Opinions vary on where and how often to use test
doubles. "Mock objects are both really damn useful and ridiculously
annoying at the same time", says Gregory Brown.
Mocking means testing a component without testing its
dependencies by creating fake objects that mimic the dependencies.
For example, if you want your unit tests to run without a database,
you might create a mock database object that returns what data the
might have returned. Now your unit tests can run without a
real database. Your test also gets a performance boost since they
don't have to access a real database.
In reality, setting up and maintaining mocks is a lot
of work. You are, essentially, creating another class to do exactly
the same things the original class does, including implementing its
public interface and maintaining state information. To mock a database
class, you
have to create a second database class that doesn't actually connect
to a database. Mocks are often criticized for being an overkill
solution, or for programmers oversimplifying their mocks. In this
example, a second database for testing can be installed and configured
in a few minutes, perhaps using in-memory tables, as opposed to
several weeks writing a realistic, complete database mock class.
An easier method is to use a stub. This is a mock
object that has no logic and returns predefined responses. If your
database schema changes a lot, the stubs have to be kept up-to-date.
Even a stub can be overkill. Suppose you want to
load some test data from a file. Is it worth it to create a stub
object? What are the odds you will want to run your unit tests
without a file system? In a pinch, you could use a ramdisk.
Often, it's enough to test the dependencies first, and
then test the classes that depend on them. If the dependencies pass
their tests, you are reasonably certain that they can be relied upon
in later tests.
If you are using a library that provides you with
mocks or stubs for free, then you might consider using them.
You need to use reflection, which has the drawbacks I mentioned earlier.
There's always the possibility that the mocks and stubs will not be
perfect and will cause or hide bugs themselves.
In general, use test doubles only when you have a
real need, and when their limitations don't outweigh their benefits.
You can minimize your need for test doubles by not
declaring a lot of unnecessary objects in your component.
People are bad at evaluating risks and payoffs. This is true
in both driving and programming. Unit tests are usually white box tests
that do detailed checks to make sure your program is running smoothly and
adheres to the rules of the road. This is good for everyone.
Steve Sanderson writes, "For certain types of code, unit testing
works
brilliantly, flows naturally, and significantly enhances the quality of the
resulting code. But for other types of code, writing unit tests consumes a
huge amount of effort, doesn't meaningfully aid design or reduce defects at
all, and makes the codebase harder to work with by being a barrier to
refactoring or enhancement."
(Selective Unit Testing - Costs and Benefits)
There is no magic formula for how many unit tests you
need. The benefits they provide depend on the problem you are solving,
the business requirements, the language you use and whether you are doing
continuous integration. Evaluate the risks and provide enough tests
to protect critical components or components that are complex and slow
to change. Evaluate the costs so that you do not write a lot of
redundant code and maintenance without value. An approach that leans on
unit testing's strengths can improve your quality and productivity.
Like driving a car, try to look beyond the obvious.
Don't let your driving...or your programming...be short-sighted and
reckless in the face of unexpected, long-term costs.