In an exchange with a visitor at my Database Debunkings site who referred to the common notion of "denormalizing for performance," I stated the following: "Normalization is a logical concept, performance is determined at the physical level. Therefore, it is impossible to denormalize for performance." That response evoked the following question from another visitor:
First, let's understand my argument by using an analogy. Suppose I ask you to retrieve two sets of books, one set of two books and one set of four books, from the library. If I asked you which set will take you longer to retrieve, would you be able to tell me? When I ask practitioners this question, most say no: they realize they cannot know how long each set will take, because that will be determined not by the number of books per se, but by a whole set of physical characteristics of the library: its size, how many floors it has, how are the books organized/distributed, is there a catalog, are there elevators, how fast are they, etc. It is entirely possible for the four books to take less time if, for example, they are all on the same subject and stored together, while the two are on different subjects and stored far apart (and if any of the books were used by others, retrieval could take a long time indeed).
Normalization, which is a purely logical design principle, increases the number of logical tables in the database--books in our analogy--so the number of logical tables to be accessed by a query does not, per se, say anything about the performance, because performance is determined by physical implementation characteristics such as hardware, size and physical design of the database, storage and access methods, DBMS optimizer, concurrent access, etc.--the counterparts to the library physical characteristics. In other words, a more normalized database does not necessarily perform worse than its less normalized version ...
To continue reading for free, register below or login
To read more you must become a member of SearchOracle.com
');
// -->

and, conversely, undernormalization does not necessarily improve performance.
If so, why is the concept "denormalization for performance" so entrenched in the industry? When I ask practitioners this question, I generally get a response along the following line: "Well, it's all fine and dandy 'in theory', but we live in the real world and with current products I cannot get acceptable performance unless I denormalize my databases."
Let us assume, for the sake of argument, that current products do always require undernormalization to perform well. Note very carefully that even in this case the conclusion to be drawn is not that normalization causes performance problems, but rather that the physical implementation of DBMS products, or physical database design, or both, are not good enough to support correctly designed databases, which is exactly what fully normalized databases are!
This follows logically from the fact that
- it is physical implementation, not logical design, that determines performance
- denormalization imposes an integrity control burden that is not only prohibitive, but may also requires application code, because SQL DBMSs fail to support the necessary integrity constraints (as I demonstrate in chapters 5 and 8 in my book)
What this means is that if, as is claimed, SQL DBMSs require denormalization to perform well, the necessary controls will not be implemented, opening databases to almost certain corruption. (I dare you to find any denormalized database with such controls in place!) Full normalization is the correct design precisely because it does not impose the additional integrity control burdens!
But, you may say, if I must denormalize to get good performance, what difference does it make who is to blame? Practically, it is the end result that counts. Well, what you blame does make a difference: if users believe, erroneously, that normalization is the culprit, they will never demand, via their collective purchase pattern, product improvements--they will continue to denormalize without controlling redundancy, oblivious to the risks. If, on the other hand, they understand the perils of denormalization and the advantages of correctly designed databases, they will, at least in the long run, expect better implementations and will not accept less.
Here's what I replied to the question above:
About the Author
Fabian Pascal has a national and international reputation as an independent database technology analyst, industry critic, consultant, author and lecturer. For more than 13 years he held various analytical, and management positions in the private and public sectors, was affiliated with Codd & Date, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology and implementation. He is co-founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, with C.J. Date as senior contributor. He has contributed extensively to most trade publications, and his third book, Practical Issues in Database Management--A Guide for the Thinking Practitioner, was recently published by Addison Wesley.
For More Information