[Navigation Bar]  
 
 

    

[OpenSUSE powered]
[BUSH powered]
[vi powered]
[XML] [RSS]
The Lone Coder
Reflections for the Unsung Linux Saviours
by Ken O. Burtch
 
[Lone Coder]

 Painful XML

There is an old saying in computers: if it's useless, it will become a standard. Whether it's error-prone C, bloated and beaucratic Java or unreadable SQL, all these became standards despite the fact there were superior approaches that did a better job. Or consider the Pentium processor line: still a glorified 80486 at its heart with a small number of quirky registers, it still remains the best selling processor despite superior chips like the G series or DEC's short-lived 64-bit Alpha.

Under the same reasoning, XML (the eXtensive Markup Language) is sure to be a success.

EDI (electronic data interchange) formats, ways of representing data in a machine or industry independent way, have been around for decades. The publishing industry was one of the pioneering industries to adopt EDI, spawning standards like the 70-character BISAC format which traces it's line length to the size of punch cards.

One of the more recent EDI formats was EDIFACT, developed for use in Europe. For maximum flexability, EDIFACT used a nested tags to represent data. For example, the meaning of a price tag was dependent on which other tags surrounded it. Does it sound like XML? EDIFACT was similar to this regard.

But EDIFACT was a failure for two reasons. Although nested tags provide great flexibility, it is incredibly hard to get meaning from nested tags. You can create a stack-based or recursive program to parse the EDIFACT tags, but at the meaning of a tag is not obvious to visual inspection. You can't even "grep" EDIFACT since any single line was useless without walking through all the tags that appeared before it.

A second problem with EDIFACT's nested tags was that data couldn't be used directly. Databases use a relational representation of data, rows with columns, which is not compatible with EDIFACT's nested tags. So it took complicated programs to produce EDIFACT files, and complicated programs to convert it back to flat file data for loading into a database. So this begs the question of why information wasn't sent in a format that was more compatible with standard business practices.

XML was a data interchage format which, according to the book "Office 2003 XML", was heavily influenced and endorsed by Microsoft. XML also uses nested tags: those who do not learn form the past are doomed to repeat it.

Such a useless standard is doomed to become a standard so you'd better learn it.

December 26, 2004 

Read More:  Business Shell: Critics say "I Don't Get It" --> 

 
     

« Truth Humility Communication Nobility Freedom Purity Excellence Right Support Courage Compassion Quality Honesty Trust Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994