An article that begins "Relational databases have been with us since Edgar Dodd [sic] first defined their characteristics in 1970" isn't very promising, is it? That's how the "A La Marge" column in the July issue
In the early 90's, the object-oriented database management system (OODBMS) was the fad in residence, like warehouses and XML are today. A representative quote from 1993 by "respected technical analysts," goes as follows:
"Although it is now certain that the next generation of databases will be object databases [oh yeah?], we cannot predict with any confidence which the dominant products will be ... one thing we can be sure of: They won't be relational at the physical level." -- Butler & Bloor, DBMS
Leaving aside the abysmal lack of understanding of relational technology, when relational proponents argued that OO had existed for quite a while in programming but had fundamental problems for data management, the claim was that OODBMS technology was still "in its infancy" and had to "mature." We rarely hear about ODBMS's anymore. (Bloor and Butler are now in favor of a new "data model" which is not a data model either, see this article.) On the rare occasion when we do hear about them, they're still not ready for prime time and for good reason: they can't be! Would you want to do data mining on top of objects? Or business rules? Or temporal databases?
I've learned a long time ago that in the database field one cannot assume that because a position "seems right," the underlying arguments are necessarily relevant, or correct. In fact, I have come across numerous occasions where practitioners or pundits expressed what seemed to be reasonable positions, based on faulty, or uninformed arguments. Let's consider some of those in the column.
"... data in a relational database are represented as two-dimensional tabular structures composed of columns and rows."
No. As Chris Date has repeatedly reminded us, R-tables are not "two-dimensional." Their pictures are constrained by space to be two-dimensional representations of the tables, but the R-tables themselves are not the same as their pictures and actually represent n-dimensional structures, where n is the number of columns.
"Over the past thirty years, several popular relational database management systems (RDBMS) have been developed that commonly use the SQL query language to quickly extract different views of the data within the database."
SQL DBMSs are not true and fully relational implementations, which means that they do not yield anywhere near the many practical benefits that relational technology can confer. Unfortunately, this incomplete and flawed distortion of the relational model is its only implementation by the industry, which is why most practitioners erroneously think that SQL DBMS are RDBMSs.
"Before the data stored in an RDBMS [read: SQL DBMS] can be used in an object-oriented program, data in the tables must be mapped to the objects that use the data. If the objects use complex combinations of the data in the database, this mapping can cause otherwise efficient programs to stall. In an attempt to overcome this "impedance mismatch," several OODBMS's have been developed over the last decade. Data in these OODBMS's are stored within predefined objects. When an object-oriented recalls data from the database, it merely passes a reference to a particular object ID and the object is loaded, ready-to-use, directly into memory."
This paragraph reveals the common lack of understanding of fundamentals, complete with confusion of the logical (model) and physical (implementation) levels -- see the normalization articles in this series). Having been invented as a set of "good programming" guidelines, object-orientation lacks a data model in the sense in which the relational model is one. Not only is the so-called "object model" not a data model, but it seems that no two OO proponents agree on what an object is, or provide a clear and precise definition for one. And that is predictable, given lack of a theoretical foundation and the fact that OO was not intended for data management in the first place. This is one of the reasons why OO cannot be extended to database management. Thus, OODBMSs that store data in "predefined objects" are not only non-relational DBMS's but, in fact, are really "application-specific DBMS-building kits" so to speak, somewhat of a contradiction in terms -- see Chapter 1 "Careful What You Wish For: Data Types and Complexity" in my Practical issues in database management.
Note: "Object-model" is usually used to refer to a conceptual model of a specific enterprise, not a data model; for the confusion of three types of model see "Something to call one's own" forthcoming at Database Debunkings.
Whether OO programs are all "otherwise efficient" is highly debatable, but even if they were, efficiency has absolutely nothing to do with the data model employed -- which is purely logical. Performance is a pure implementation issue. Thus, to the extent that OO programs "stall," it is not because the DBMS is relational and the programs are OO, but rather determined entirely at the physical level by the DBMS implementation (SQL, not relational!), database and application design (rarely correct) and factors such as hardware, load, networking, etc. Incidentally, the so-called "impedance mismatch" usually refers not to performance gaps between OO programs and SQL DBMSs, but rather to the procedural nature of programming languages in general and the declarative nature of SQL (well, at least more declarative than programs; see next).
"Although the use of an OODBMS resolves impedance mismatch, important problems arise. For example, object syntax varies across languages so object-oriented databases are only accessible from particular languages. Also, since data is encapsulated in objects, searching through different views of the data can be slow."
To the extent that one can speak of "impedance mismatch" in its traditional sense (it is not precisely defined and used to cover a multitude of sins), the real solution is, of course, not to make the data language procedural, but to make the programming language as declarative as possible. OODBMS's do the exact opposite, which can hardly be called a solution. At least part of the reason why OODBMS languages vary has also to do with the lack of consensus on OO fundamentals. As to encapsulation, it raises issues much more serious than performance (the logical/physical confusion again). A major one is that OO encapsulation essentially means that there is no such thing as ad-hoc querying. Any operation (method) that the object programmer has not built into the object is simply not available to the user. In the relational approach, on the other hand, only some domains (read: data types, equivalent to OO object classes) encapsulate; relations (read: R-tables) -- where the rows (objects) are -- do not, which allows for ad-hoc querying.
As Chris Date points out: "It is odd that so many [object proponents] tend to use employees, departments, and so forth as examples of object classes. An object class is a type, of course, and so those [proponents] are forced to define a "collection" for those employees. What is more, those "collections" typically omit the all-important attribute names, so they are not relational tables. As a consequence, they do not lend themselves very well to the formulation of ad hoc queries, declarative integrity constraints, and so forth -- a fact that advocates of the approach themselves often admit, apparently without being aware that it is precisely the lack of attribute names that causes the problems."
"... an article posted in May to the open news forums ... entitled "Why aren't you using an OODBMS?" ... [in which] a Georgia Tech computer science student submitted and independent research project discussing the relative merits of OODBMS vs. RDBMS. He listed six advantages to using an OODBMS. The most important reasons were the existence of one data model, the lack of an impedance mismatch and a faster interaction with complex data."
Student research should, by its nature, be read with considerable care, for obvious reasons. Much more so when, as I have repeatedly demonstrated, academia does not understand fundamentals any better than the industry -- see Denormalization for performance -- Et tu academia?. In fact, the very column under consideration offers proof of that: "Marge has a doctorate in applied mathematics from Northwestern University [my alma mater, should I return my degree?] and has recently left an academic position as an associate professor of mathematics to become the chief technical officer at a struggling dot-com." If a Ph.D. in mathematics -- on which relational theory is based! -- does not realize the critical distinction between RDBMSs, and OODBMSs -- which lack a scientific foundation -- what can be expected from students, or the vast majority of practitioners who have no formal exposure to fundamentals? How is it possible for anybody with scientific training to consider those three reasons "most important" for using an OODBMS when:
- the fundamental problem of OODBMS is precisely that it does not provide a well-defined data model
- the OO approach does the exact opposite of what a solution to the so-called "impedance mismatch" ought to be
- performance -- no matter how "complex" the data -- is an implementation issue and, thus, has nothing to do with RDBMS vs. OODBMS
Thus, despite the fact that "Marge" recognizes that the single disadvantage claimed by the student, namely "any change to a data structure in an OODBMS requires a change in the objects that encapsulate that data" (also known as lack of data independence, which relational technology was explicitly invented to resolve) "could be pretty serious if your data structures evolve with a project," this is a much too mild assessment of the OODBMS costly deficiencies. It leaves the impression that OODBMS's can, and perhaps one day will, be ready for prime time as superior to RDBMS's, when that cannot possibly be the case.
What Marge finds fascinating about the [student's] posting [is that] "within a few hours, many hundreds of people had read the student's thesis and had responded with statements that were truly impressive in their number and quality" (as of this writing, there are over 100 pages). That is, in my experience, predictable when topics are "hot," which usually means way more heat than light, which is very rarely impressive. The clear and precise relational thinking leads to succinctness and genuine controversies are quite rare. The opposite is true of OO thinking, which generates volumes of drivel due to its fuzzy and confused nature. An excellent example (also academic) is one of the weekly quotes at Database Debunkings.
"Object-oriented systems can be classified into two main categories -- systems supporting the notion of class and those supporting the notion of type ... Although there are no clear lines of demarcation between them, the two concepts are fundamentally different." -- E. Bertino & L. Martino, "Object oriented database systems: Concepts and architectures"
I rest my case.
About the Author
Fabian Pascal has an international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. His third book is called Practical issues in database management: A guide for the thinking practioner.
For More Information
- What do you think about this column? E-mail us at editor@searchDatabase.com with your feedback.
- The Best Web Links on the relational model and its alternatives
- The Best Web Links on object-oriented databases
- Have an tech tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Post your technical questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts! Our database design, SQL, Oracle, DB2, and SQL Server gurus will answer your toughest questions.
This was first published in September 2001