As simple as possible, but not simpler

Fabian on data models, defining information, and the appreciation of theory.

This Content Component encountered an error

The title of this article is one requirement of a good data model. In my seminars, presentations and writings, when I describe the advantages of the relational model for database management, I always count simplicity as a major one, something lost in all the client/server, ODBMS and XML hoopla. I maintain that the vast majority of practitioners, who are self-taught, lack an adequate fundamental education, have been working with such...

unnecessarily complex technologies and products and employing such unnecessarily complex practices, that they can't cope with simplicity anymore: when something is simple, they must complicate it before they're comfortable with it.

It is in this spirit that I am responding here to a reader reaction to one of my recent columns. He writes:

To a hammer, everything looks like a nail reads like a continuation of an epic story, of which Fabian is not only a writer, but a true believer and participant. There are few people in IT industry who rise above brands and technologies to heights of scientific generalization. Fabian does so by placing his feet firmly on the base of relational theory. That allows him to see things which we, mere practitioners, do not recognize beyond the "fuzziness" of physical considerations. And we cannot appreciate the theory either. I judge this by the practicality of questions asked in the "DBA Water Cooler." This note is a teaser for a more abstract type of discussion.

I appreciate the sentiment. But let me first point out that "placing one's feet firmly on theory" and "raising to heights of scientific generalization" is what a scientist, not a believer does. The relational model is science: it is nothing but the application of predicate logic and set mathematics to database management. Belief, on the other hand, ought to be the domain of religion and psychology, not data management. It is ironic that we relational proponents, firmly rooted in science, are often called "religious" by our detractors, who believe, often dogmatically, in all sorts of nonscientific approaches that cannot be adequately defended or justified (it certainly helps to believe if it's profitable). I always ask those who propose and/or believe in alternative approaches to relational technology with what exactly they replaced logic and math (see, for example, On Respected Technical Analysts, On the So-Called Associative Model of Data), but have yet to get an answer.

Physical considerations do not have to be fuzzy and, in fact, can be quite precise and clear. It's the thinking that is fuzzy and in the industry, it is so more often than not. It's not just fuzziness, though, that is at issue when it comes to implementation details. It is confusion between them and the logical level (see the several articles on denormalization in this series and The Logical-Physical Confusion).

Physical considerations are very important, of course, but they cannot substitute for conceptual and logical ones; you must have complete, clear and consistent a conceptual and a logical model to implement in the first place. Unfortunately, most practitioners rush into implementation before they do. The problems caused by inadequate conceptual or logical design cannot be resolved at the physical level (see The Dangerous Illusion: Normalization, Performance and Integrity).

Working in IT does not require a scientific background, does it? To be an IT practitioner, all you need is to enjoy thinking logically and to hang around for a while. One has got a Computer Science degree; very well, but this is irrelevant to Human Resources. A logical-minded person can master software development by doing it. Computer Science is a misnomer, because there is no scientific methodology in the way we build Information Technologies. There is only IF/THEN logic. It is very attractive because of its intuitiveness and determinism. The problem comes when attempting to build a system complex enough, because then it is not deterministic any longer. Then, to evolve a complex system including its integration with other systems--that is a real challenge! No wonder the demand for IT professionals cannot be saturated. Yes, the industry has moved from Assembler to Java development, but it is the same natural logic that is required from a professional, because there is no more science in OO than in procedural development.

It is not entirely clear if the reader means the start of this paragraph to be tongue in cheek, but I will assume so. "Thinking logically" is indeed one critical requirement for database practice--after all, what is a database other than logically organized data, and what is a DBMS other than a logic inference engine? But the problem is precisely that too many practitioners don't think logically. In large part the reason is inadequate education. This does not mean practitioners need to be scientists (although a scientific background would not hurt). Rather, it means that they need to know and understand the practical implications of science. For example, while most practitioners work with tables in SQL databases, very few have a good grasp of what exactly the tables represent. They are certainly not aware that the rows represent propositions and the tables represent predicates. Neither are they aware that it is the theory that gives a DBMSs the capability to guarantee correctness of the database and the results derived from it. But it's not entirely their fault: the industry -- including academia -- simply don't expose and require them to know this.What is more, any attempt to expose them to it is rejected by the industry (see my first article in this series and It's All in the Jobs).

Relational theory has provided a very effective starting point for building complex systems by separating the data from the execution logic. There is no other theory to date that can supersede the relational theory. I would argue that THE THIRD MANIFESTO is itself too internally complex to help building complex systems. My question is as follows: can someone give a fundamental definition of "information"?

I have two reactions to this quite common attitude in the industry (which led to the title of this article). I wonder if bridge builders, for example, would reject the laws of physics as too complicated for building complex bridges? Or the builders of atomic reactors would reject quantum physics as too complex to rely on? Once in a blue moon we do have individuals or institutions who do that, but they are the rare exceptions, not the rule, and they disappear quite fast. In the database field, however, long and successful careers and expert reputations can be built on the systematic dismissal of theory as either impractical, or too complex.

What makes this even worse, is that not adhering to theory actually makes database management much more complex, not simpler. The third manifesto provides the simplest reliable framework for building DBMSs. All other approaches are either more complex, or incapable of guaranteeing correctness, or both. To the extent that practitioners find them simpler or easier it is, as I already mentioned, because they don't know any better; or they simply ignore the loss of reliability, of which they are unaware (see The Dangerous Illusion: Normalization, Performance and Integrity).

Appreciation of theory is not something that most people can do intuitively. One would hope that at least those going through a formal process of learning the scientific method would know better, but American culture does not place sufficient value, and even dismisses theory as "academic" and, therefore, impractical; this society is quite anti-intellectual. Social and economic rewards and punishments are determined by the culture and it should not, therefore, surprise that most database practitioners either do not go through a scientific education at all or, if they do, they are not given any incentive, but quite a lot of disincentives to incorporate it in their practice. Those few who try discover very quickly that it is an uphill battle, and that it is much easier and profitable to just go with the flow.

I have a simple definition of information, which produces useful definitions of a database and a DBMS: meaningfully organized, intelligently accessible data.

About the author

Fabian Pascal has an international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld and Computerworld. His third book, Practical issues in database management (Addison Wesley, June 2000), serves as text for a seminar bearing the same name. He can be contacted at editor@dbdebunk.com.

For More Information


This was first published in June 2002

Dig deeper on Oracle DBA jobs, training and certification

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close