Normalization and performance: Never the twain shall meet!

Fabian Pascal discusses the concepts of normalization and performance.

I am looking for reasons to normalize in Sybase. The more traditional arguments of stability, flexibility, reduced redundancy, and adaptability have been presented. The problems of anomalies within the information is known. In a Sybase environment, what are the performance implications of having a very wide table versus smaller normalized tables? --Forum question at datamodel.org

In an exchange with a visitor at my Database Debunkings site who referred to the common notion of "denormalizing for performance," I stated the following: "Normalization is a logical concept, performance is determined at the physical level. Therefore, it is impossible to denormalize for performance." That response evoked the following question from another visitor:

I read your "Exchange on Database Fundamentals" and I have a question about one of your statements. GW states that one of the "four great lies" is "denormalize for performance." You state that normalization is a logical concept and since performance is a physical concept, denormalization for performance reasons is impossible; i.e., it doesn't make sense. What terms would you use to describe changing the physical database design to be different from the logical design to enhance performance? You imply that this is not called normalization because normalization is a logical concept.

First, let's understand my argument by using an analogy. Suppose I ask you to retrieve two sets of books, one set of two books and one set of four books, from the library. If I asked you which set will take you longer to retrieve, would you be able to tell me? When I ask practitioners this question, most say no: they realize they cannot know how long each set will take, because that will be determined not by the number of books per se, but by a whole set of physical characteristics of the library: its size, how many floors it has, how are the books organized/distributed, is there a catalog, are there elevators, how fast are they, etc. It is entirely possible for the four books to take less time if, for example, they are all on the same subject and stored together, while the two are on different subjects and stored far apart (and if any of the books were used by others, retrieval could take a long time indeed).

Normalization, which is a purely logical design principle, increases the number of logical tables in the database--books in our analogy--so the number of logical tables to be accessed by a query does not, per se, say anything about the performance, because performance is determined by physical implementation characteristics such as hardware, size and physical design of the database, storage and access methods, DBMS optimizer, concurrent access, etc.--the counterparts to the library physical characteristics. In other words, a more normalized database does not necessarily perform worse than its less normalized version and, conversely, undernormalization does not necessarily improve performance.

If so, why is the concept "denormalization for performance" so entrenched in the industry? When I ask practitioners this question, I generally get a response along the following line: "Well, it's all fine and dandy 'in theory', but we live in the real world and with current products I cannot get acceptable performance unless I denormalize my databases."

Let us assume, for the sake of argument, that current products do always require undernormalization to perform well. Note very carefully that even in this case the conclusion to be drawn is not that normalization causes performance problems, but rather that the physical implementation of DBMS products, or physical database design, or both, are not good enough to support correctly designed databases, which is exactly what fully normalized databases are!

This follows logically from the fact that

  1. it is physical implementation, not logical design, that determines performance
  2. denormalization imposes an integrity control burden that is not only prohibitive, but may also requires application code, because SQL DBMSs fail to support the necessary integrity constraints (as I demonstrate in chapters 5 and 8 in my book)

What this means is that if, as is claimed, SQL DBMSs require denormalization to perform well, the necessary controls will not be implemented, opening databases to almost certain corruption. (I dare you to find any denormalized database with such controls in place!) Full normalization is the correct design precisely because it does not impose the additional integrity control burdens!

But, you may say, if I must denormalize to get good performance, what difference does it make who is to blame? Practically, it is the end result that counts. Well, what you blame does make a difference: if users believe, erroneously, that normalization is the culprit, they will never demand, via their collective purchase pattern, product improvements--they will continue to denormalize without controlling redundancy, oblivious to the risks. If, on the other hand, they understand the perils of denormalization and the advantages of correctly designed databases, they will, at least in the long run, expect better implementations and will not accept less.

Here's what I replied to the question above:

What I think you are referring to is data independence: it's not [just] that the logical design is "different" than the physical...but rather that the logical is insulated from the physical. That means it should be possible to make any physical changes deemed necessary for performance without affecting what users see at the logical level.

Chris Date calls such changes "physical denormalization." While I understand what he means, I don't particularly like the term because normalization is a set of design principles based on (logical) column dependencies and I don't exactly see how this applies at the physical level. I call such changes simply "physical design changes."

 

About the Author

Fabian Pascal has a national and international reputation as an independent database technology analyst, industry critic, consultant, author and lecturer. For more than 13 years he held various analytical, and management positions in the private and public sectors, was affiliated with Codd & Date, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology and implementation. He is co-founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, with C.J. Date as senior contributor. He has contributed extensively to most trade publications, and his third book, Practical Issues in Database Management--A Guide for the Thinking Practitioner, was recently published by Addison Wesley.

 

For More Information

This was first published in March 2001

Dig deeper on Oracle database performance problems and tuning

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close