Manage Learn to apply best practices and optimize your operations.

XML data management: Setting some matters straight, Part II

Fabian Pascal continues his arguments to counter the XML hype machine.

This is the second part of a series. Part 1 can be viewed here.

Let's consider Jelliffe's specific arguments. I stopped guessing at people's meaning when they say "physical"--logical-physical confusion is rampant in the industry--but Jelliffe's argument seems to be that XML is just a "physical format" (like ASCII or PDF?) for data exchange. If so, what exactly is the advantage of XML over any other such format? Jelliffe contends that XML is a "fairly rich and adaptable format" (adaptable to what?) and that it "has some nice performance characteristics." (Does text have the best performance characteristics for data exchange purposes?) Is there no other available format that is rich and adaptable, or has nice performance characteristics?

Jelliffe points to an example I quoted from an article by Bosak and Bray, two of the inventors of XML:

"... imagine going to an online travel agency and asking for all the flights from London to New York on July 4. You would probably receive a list several times longer than your screen could display. You could shorten the list by fine-tuning the departure time, price or airline, but to do that, you would have to send a request across the Internet to the travel agency and wait for its answer. If, however, the long list of flights had been sent in XML, then the travel agency could have sent a small Java program along with the flight records that you could use to sort and winnow them in microseconds, without ever involving the server. Multiply this by a few million Web users, and the global efficiency gains become dramatic."

and argues that:

"Mr. Bosak is saying merely it scales well to have a coarse-grained query made once ahead of time whose results can be viewed in different ways at the jet-setter's (presumably disconnected) leisure. The reason for this might be as simple as bandwidth or network availability. Mr. Pascal's comments that this represents a regression from centralized database management are mysterious to me, given that the example is based on the availability of centralized databases. What use is a centralized database when you are off-line at 30,000 feet?"

There is no reference to a "disconnected" situation in the quote, but let's ignore that. If an application program is sent with data to the client, then any format known to the application program would do! Note also that implicit in sending applications along with the data is the notion that the application developer knows in advance the operations users will want to perform on the data. Programmers tend to subscribe to this notion, but as Chris Date points out with respect to extending object-orientation from programming to databases "As my friend David McGoveran once said to me, "These object guys are all Platonic idealists -- they think there's only one way to look at the world." But as database people, we know there isn't just "one way to look at the world." That's why objects and databases are a bad fit. (Did I hear someone say "impedance mismatch"?)

The point is that regardless of the amount of data transmitted (developing different technologies for small and large data is not a good idea; how small is "a small amount of data" anyway?), "fine-tuning the departure time, price or airline" is data manipulation, a DBMS function which is now performed by application programs. The reason we have databases and DBMSs in the first place is that we already tried to manage data via application programs and had to give it up. The fact that application programs now come with the data over the network does not address this fundamental issue and is hardly an improvement, just the opposite (application developers cannot know the platform and load of every user to which data and applications are sent and cannot, therefore, optimize performance, as the DBMS on the server can; besides, optimization is also a DBMS function). As I argued in my article, however "dramatic the global efficiency gains", equally if not more dramatic is the burden of developing and maintaining all the additional application code that would otherwise not be necessary.

This notwithstanding, Jelliffe claims that I "completely misread "application" as used in the XML 1.0 specification to mean 'application programs' in the database sense. In XML, an application may be a database system ...". Well, may means 'not necessarily' and besides, the question remains: what is the advantage of XML as a physical format for data transfers, be it between application programs or DBMSs? (That XML proponents use the term application to refer also to DBMSs is instructive, see below).

In my article I quoted Bosak and Bray as follows: "... People and companies want Web sites that take orders from customers, transmit medical records, even run factories and scientific instruments from half a world away. HTML was never designed for such task ... in essence, it describes how a Web browser should arrange text, images and push-buttons on a page... [And while] ... people can look at this page, see some large type followed by blocks of small type and know that they are looking at the start of a magazine article [or t]hey can look at a list of groceries and see shopping instructions [or t]hey can look at some rows of numbers and understand the state of their bank account, [c]computers, of course, are not that smart; they need to be told exactly what things are, how they are related and how to deal with them." [emphasis mine] I argued that what the authors are talking about is nothing but database management, but Jelliffe disagrees:

Mr. Pascal says "the authors are talking about nothing but database management", but the examples in the Bosak & Bray article speak of taking orders, transmitting records, and running factories. The point about centralized functions is otiose [sic] -- data needs to get from A to B, and XML provides one way to do it.

Is Jelliffe suggesting that data transmitted for order taking, record transfers and running factories is not managed? I submit that the transmission is ephemeral, but nevertheless still an integral part of data management. The transmitted data better be managed and persist!

To be continued...

About the Author

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. His third book is called Practical issues in database management: A guide for the thinking practioner.

For More Information

Dig Deeper on Oracle XML

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.