The Lone Coder Reflections for the Unsung Linux Saviours
by Ken O. Burtch
The Diversi Effect
In the 1990's I owned an Apple IIgs computer. It was the
time of "Apple II Forever", when the Apple brand dominated schools
and Apple swore that the Apple II line would be supported along side
of the Mac. Which, as we now know, didn't turn out to be the case.
There were two disk caching systems I used on the Apple
IIgs.
The first was Diversi-Cache, a shareware program by Bill
Basham of DSR. This disk caching program read in and saved in memory the first
blocks on a disk, the blocks where the main volume table of contents were located
(Documentation).
The second disk caching program came with the memory card I purchased.
This caching program did an analysis of disk activity, rated blocks on their
access rates and optimized the memory usage.
Which program provided faster disk access?
Diversi-Cache.
This gives rise to a software design phenomenon that
I call the "Diversi Effect": sometimes the
simplest solutions are the most effective. In the case of Diversi-Cache,
it was faster because it cached the disk's main directory, the part of the
disk that was read the most. It did it simply and quickly.
It achieved the maximum benefits with the minimum work, "the most bang for
its buck". The other
program may have been more efficient in its memory management, more clever in its
calculations, but those efforts slowed the program. The slow down outweighed the
benefits.
Another possible reason for the Diversi effect is writing
software with "peephole optimizations". That is, a developer can outsmart
himself by trying to be clever with the individual pieces of his work while
creating something that, overall, runs slower because the overall vision was
lost. Noel Llopis aludes to the problems of being too clever in his article,
"Simple is Beautiful" ("Games from Within" blog).
An example of the Diversi Effect is in the Gen_List package
used in many of my open source projects. Gen_List is a simple linked list
package. There's an old software joke that the first thing every programmer
does is write a linked list package. At the time I was working with the
GCC Ada (GNAT 2.0) compiler, I wanted a project to teach myself Ada. Using
objects wasn't an option since objects were fully debugged and implemented
at that time.
I applied the Diverse Effect to the Gen_List package to try to make something
usable for my future projects.
List Structure. I used a single-link list. Most implementations of
lists used double-linked lists. Double-linked lists are slightly
slower but they allow easy movement in both directions. But, by the Diversi Effect
principle, 9 times out of 10, people only move in one direction through a list.
The double-linked list is overkill for most applications (with the exception
of large lists that must be searched in two directions). So I decided to use a
single-linked design.
Deletion Cache. Dynamic memory allocation
is one of the slowest operations on a computer. A list should never free up
deleted memory right away because reallocation is too slow: items should
be saved and recycled. Many list packages put
the deleted items into a separate list so that memory can be reused.
But, following the Diversi Effect, maintaining a list...inside
of a list...is complicated. When queuing items or sorting, usually only one
list item is removed at a time. This means the most benefit comes from caching
the first deleted item with diminishing returns for subsequent items. The
solution in Gen_List was to have a deletion cache for only one item. That is,
I used a variable not a list. The cost to see if the variable was null or
not is very low: it provided the most performance increase with the minimum
about of work.
Search Cache. Why search the list from the start every time?
What about moving through the list, first item to last? It would be good
to remember recent searches to improve search times. But the most benefit
is gained by remembering the position of the last search result.
For any search, check the last search result to determine if the search
should commence from the last position or from the start of the list.
For random searches on ordered data, this approach cuts search times in half.
When iterating through the list, it means the next item is immediately after
the saved position, making it fast without resorting to awkward workarounds
like list iterators. That is, as long as only one subprogram is searching
the list, which is the usual case.
As a result, the Gen_List package is small and fast. It
has optimizations, but those optimizations are small and target the most
common slow spots.
Is Gen_List the perfect linked list package? In my
Tiny IDE for Ada/Anything, source files are
stored in Gen_List lists. This is a case where a double-linked list
would be faster, especially for very large source files. Load a 6000+ line
source file into TIA and scrolling up is noticeably slower
than scrolling down. Because Gen_List is single-linked, moving up requires
Gen_List to search from the first line of the source file to find the
next line to display. Using the wrong data structure for a project is
never the right solution.
So if you are a developer, before starting your next project,
ask yourself what are the minimal optimizations you can make for the maximum
benefit? Sometimes those small optimizations will make a bigger improvement than
a complicated scheme. Make the Diversi Effect work for you.
« Truth Humility Communication Nobility Freedom Purity
Excellence Right Support Courage Compassion Quality Honesty Trust
Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994