At Database Debunkings I recently posted an exchange on whether or not MySQL is a DBMS, let alone relational, and...
invited readers to test themselves on their ability to see through the prevailing fallacies and misconceptions, including those by designers of DBMS products. I am debunking the exchange in this month's column. In what follows, HT is Heikki Tuuri, author of InnoBase, a product he refers to as InnoDB and describes as a "back-end which provides transactions to MySQL" (a concept which, by itself, reveals a misconception, can you figure it out?). LF is Lee Fesperman.
DW: I think the issue is with lack of support for sub-selects (I'm sure there is a better name for them, but I don't know it) is a bummer. I've read the documentation and realize that using JOINs can help, but it's still a problem for me.
HT: Did Codd demand SQL sub-queries in his relational algebra? Probably not, because they are not first-order definable, and SQL was only later developed to suit practical needs.
LF: No, SQL sub-queries are not part of relational algebra. However (as Bob pointed out), SQL is not a very complete query language and needs sub-queries to bolster it. Without them, SQL suffers badly in power and capability.
HT: Bob's example is a second-order formula, because it has the aggregate function COUNT() in it. It is not a formula of the basic relational algebra. Note that, for example, you cannot express transitive closure in basic relational algebra, though you can calculate it in Oracle with CONNECT BY ... A tradition of query languages in 'relational' databases is that their expressive power is restricted to allow simple optimization of queries, mainly joins. If you want a lot of expressive power you can use Prolog or C as your query language. Calling a C compiler a 'relational database' is far from what most people understand with 'relational', though.
Note: Chris Date asks "What on earth is all this about?"
SQL sub-queries (or nested queries, to which SQL owes the 'structured' in its name), are very instructive with regard to how the database industry operates. SQL started as a prototype query language for IBM's SystemR research project, intended to test the feasibility of implementing a RDBMS after Codd, an IBM research fellow at the time, invented relational technology. Sub-queries were built into SQL to avoid the explicit expression of relational operations such as JOIN, the argument being that users are not familiar with such operations and should be insulated from them.
The leading and very profitable IBM product then was IMS, a hierarchic DBMS renowned for its complexity and inflexibility. Consequently, the feedback from testers was positive and persuaded a reluctant IBM to develop relational products, at which point they were advised to design a complete, correct and truly relational data language. In particular, Chris Date, also at IBM at the time, argued that sub-queries make SQL redundant -- queries can be expressed in more than one way. Since performance would vary across different formulations, potential optimization difficulties would arise, forcing users into performance considerations -- a violation of data independence. I wrote in my first book that I could almost hear IBM's reaction: "Please don't bother us with all this theoretical stuff, we have practical things to do." They rejected Codd's own relational language, Alpha, and put SQL, sub-queries and all, in the public domain. The rest is history.
In 1989 I published in (now defunct) Database Programming and Design performance results from running seven semantically equivalent, but syntactically different SQL queries on five PC DBMSs and got wild variations ranging from 2 to 2500 seconds. The only product which performed consistently was Ingres (subsequently acquired by Computer Associates) which, at the time, had a QUEL engine with a SQL interface. QUEL was a different relational data language, one property of which was lack of sub-queries, meaning that however you expressed a query in SQL, it would map to only one QUEL execution (Chris Date designed the mapping), hence the ease of optimization and consistency. Ironically, Ingres faltered due, in large part, to its initial lack of compatibility with SQL (and, therefore, IBM); by the time they came up with the SQL interface it was too late -- one case among many that dispells the notion of best technology always winning in the marketplace. Even more ironically, relational expressions could not be completely avoided (e.g. UNION) and, ultimately, all relational expressions (JOIN, MINUS, INTERSECT) were added to SQL anyway, quadrupling (or more?) redundancy.
Oracle's CONNECT BY extension to SQL, like other such extensions, violates relational closure, by producing results with meaningful ordering and duplicates; I documented the practical implications in Chapter 7 ("Climbing Trees in SQL") in Practical issues in database management. As I demonstrate there, it is possible to design a data language that handles hierarchies and recursive queries relationally, but vendors, as usual, flouted relational principles.
The point: While nested expressions are certainly needed in a data language, SQL's sub-queries specifically are not a relational requirement and are only a partial, problematic implementation of such a feature.
DW: The main issue is that MySQL doesn't offer foreign key restraints [sic] with its native engine (which is where the performance comes from). That means it isn't relational at all. Or triggers. Or transactions of any kind.
HT: I have written one of the back-ends that provide transactions to MySQL, InnoDB. Berkeley DB (BDB) is another one. Referential integrity is on my to-do list for this month, though it will probably be delayed a bit. Transactions are available through BDB and InnoDB type tables. Referential integrity constraints can be described as a property of an 'active' database. The term 'active' is here because [sic] you could define them with triggers. Active databases became popular in the '80s when Sybase brought stored procedures. InnoDB type tables will have referential integrity constraints in September. MySQL nowadays supports transactions, logging, and crash recovery.
LF: My emphasis is on the 'management' part of DBMS. A DBMS should control and manage its data. Transactions are integral to data management, and I've seen conflicting reports on the efficacy of MySQL transaction support. Constraint support in the DBMS, especially referential integrity, is also integral to data management. MySQL fails in this area.
Transaction support for recovery is a database, not relational, function. A product lacking such support cannot guarantee database consistency at all times and, therefore, is not a true DBMS. It is worth mentioning here that Chris Date now believes that it is not the transaction, but the statement that is the unit of integrity (see Constraints and Predicates: A Brief Tutorial, Parts 1-3).
Integrity is, of course, an essential database function too -- after all, a database schema is nothing but the sum total of all the integrity constraints pertinent to a database. Indeed, that is the best approximation that a DBMS can have of the meaning of a database -- its semantics (something of which XML proponents are not even aware; mark-up tags are hardly sufficient to express meaning). The database approach replaced the application files approach precisely to migrate database functions such as integrity away from applications and to the DBMS. Calling a product that does not enforce all integrity constraints at the database level (thus, leaving such enforcement to applications) a DBMS is, in a sense, a contradiction in terms and indicates fundamental misunderstanding. This is also true for (so-called) DBMS products that permit applications to subvert their integrity enforcement.
Furthermore, referential constraints are only one kind of constraint -- database constraints -- that must be enforced by a RDBMS. Relational domain, column and table constraints must also be enforced at the database level, see Chapter 2 ("The Rule of Rules") in Practical issues in database management.
The notion of so-called "active DBMS" (not database!!!) is bunk. Stored procedures, whether triggered or not, are not a substitute for declarative integrity support for a variety of reasons. They are a cop-out by vendors.
The point: Both recovery and integrity are database functions that must be built into the DBMS. Complete, declarative support of all four types of integrity constraint -- domain, column, table and database -- is an essential relational requirement. Because recovery and referential integrity are supported by external extensions, MySQL permits subversion, violating both database and relational principles. The MySQL author is quite explicit about this, as one of our Weekly Quotes reveals: "The only rules that should reside in a database are referential integrity. (and sometimes that isn't really necessary). It is also best to keep rules out of your data access code (hard-coding WHERE values). Business rules should be centralized in Java business objects for better manageability, scalability, etc. Don't let pushy DBAs tell you otherwise. Rules in a database slow down development as well as data access time."
Continued in Part II...
About the author
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld and Computerworld. His third book, "Practical issues in database management" (Addison Wesley, June 2000), serves as text for a seminar bearing the same name. He can be contacted at email@example.com.
For More Information
- What do you think about this column? E-mail the Editor at firstname.lastname@example.org with your feedback.
- The Best Web Links on the relational model
- Post your technical database questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts! Our database design, SQL, Oracle, DB2, SQL Server, metadata, and data warehousing gurus will answer your toughest questions.