The Lone Coder Reflections for the Unsung Linux Saviours
by Ken O. Burtch
Painful XML
There is an old saying in computers: if it's useless, it will
become a standard. Whether it's error-prone C, bloated and beaucratic Java or
unreadable SQL, all these became standards despite the fact there were superior
approaches that did a better job. Or consider the Pentium processor line:
still a glorified 80486 at its heart with a small number of quirky registers,
it still remains the best selling processor despite superior chips like the G
series or DEC's short-lived 64-bit Alpha.
Under the same reasoning, XML (the eXtensive Markup Language)
is sure to be a success.
EDI (electronic data interchange) formats, ways of
representing data in a machine or industry independent way, have been around
for decades. The publishing industry was one of the pioneering industries
to adopt EDI, spawning standards like the 70-character BISAC format which
traces it's line length to the size of punch cards.
One of the more recent EDI formats was EDIFACT, developed for
use in Europe. For maximum flexability, EDIFACT used a nested tags to
represent data. For example, the meaning of a price tag was dependent on
which other tags surrounded it. Does it sound like XML? EDIFACT was similar
to this regard.
But EDIFACT was a failure for two reasons. Although nested
tags provide great flexibility, it is incredibly hard to get meaning
from nested tags. You can create a stack-based or recursive program to
parse the EDIFACT tags, but at the meaning of a tag is not obvious to visual
inspection. You can't even "grep" EDIFACT since any single line was useless
without walking through all the tags that appeared before it.
A second problem with EDIFACT's nested tags was that data
couldn't be used directly. Databases use a relational representation of data,
rows with columns, which is not compatible with EDIFACT's nested tags. So
it took complicated programs to produce EDIFACT files, and complicated programs
to convert it back to flat file data for loading into a database. So this
begs the question of why information wasn't sent in a format that was more
compatible with standard business practices.
XML was a data interchage format which, according to the book
"Office 2003 XML", was heavily influenced and endorsed by Microsoft. XML also
uses nested tags: those who do not learn form the past are doomed to repeat
it.
Such a useless standard is doomed to become a standard so
you'd better learn it.
« Truth Humility Communication Nobility Freedom Purity
Excellence Right Support Courage Compassion Quality Honesty Trust
Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994