Q
Problem solve Get help with specific problems with your technologies, process and projects.

Table design: go across or go down, continued

In reference to Pat Phelan's answer to the question of whether to "go across or go down": If you understand normalization, then you know that Mr. Phelan is completely wrong.

First normal form simply means ensuring that each attribute represents an atomic, or non-decomposable value. Both the "go across" and "go down" versions of his monthly expenses example meet this requirement.

2nd normal form means satisfying 1NF and ensuring that each attribute depends entirely on the key. Given that the key for "go down" would be the month and the expense type, and the key for the "go across" would be the month, then both versions meet this requirement.

3rd normal form means satisfying 2NF and ensuring that no attribute represents a fact about another attribute. This is where the "go down" version fails. With one column representing the expense type and another column representing the expense, the expense type column represents a fact about the expense column. In other words, they are not mutual independent.

Aside from this discussion of normalization, the "go down" version is simply often a worse option, for several reasons:

1) To store a set of expenses for a month, you have to do an insert for every type of expense. This is a comparitively expensive set of operations (storing the expenses for a month with the "go across" version only requires one insert).

2) Querying is incredibly more expensive with the "go down" version. Here's the most clarifying example. Assume the following tables:

name:go_down
month expense_type amount
----- ----------- -------
jan   salary          500
jan   rent            200
feb   salary          300
feb   rent            200
 
name:go_across
month salary rent
----- ------ ----
jan      500  200
feb      300  200
 
Now, as a manager, I ask: in what months was salary greater than 400 and rent greater than 180? Here are the SQL select statement for the two tables:
go_down:  
select month from go_down 
where expense_type = "salary" 
and amount > 400 
intersect select month from go_down 
where expense_type = "rent" 
and amount > 180
 
go_across: 
select month from go_across 
where salary > 400 and rent > 180
 
As you can see, the query for go_across is significantly more efficient than the query for go_down.

Mr. Phelan says an advantage of the go_down version is how easy it is to construct a query to get the total expenses for the month. He's right - it is easy to construct the query. Here it is:

select sum(amount) from go_down where month = "jan"
 
whereas the query for the go_across example is:
select salary + rent from go_across where month = "jan"
 
It is a bit easier to construct the query for the go_down version. BUT it is more efficient to run the query using the go_across version! With the go_down version, the RDBMS has to do more disk reads to get the answer.

Lastly, note that I put a caveat on my statement above in saying that the "go down" version is USUALLY worse than the "go across" version. Specifically, the "go down" version is worse than the "go across" version when you have a limited number of "types" that can be specified beforehand. Mr. Phelan mentions MS Money and Quicken, saying that they are normalized, and they use the "go down" method. Yes, MS Money and Quicken use the "go down" method, but we now see that they are denormalized. But in this case the "go down" method is the best option. And it's for the simple fact that the developers of MS Money and Quicken do not know what expense types users will define and want to use when entering transactions. Every database designer should be cognizant of these pluses and minuses when they design their databases.


I made this mistake in a budget application that I designed the database structure for in roughly 1986, and I cursed the design until we retired the application in 1996! This is a subtle kind of error that often sneaks up on the inexperienced data modeler, or one that has never taken a significant application from "cradle to grave" through the life cycle. Silly me, I assumed that years always had twelve months!

While it seems like you can make a choice to normalize a table in many different ways, and still reach a manageable form, that is rarely if ever the case. The "across" form of the database violates the first normal form as described at http://databases.about.com/library/weekly/aa081901a.htm?terms=normalization because it has more than one kind of expense going across the rows. Naming one column "salary" and one "rent" doesn't make them any different than "expense1" and "expense2" from the standpoint of normalization.

The fundamental issue from my perspective isn't even the normalization. Hardware is cheap. Wetware (people) are expensive! You can afford enormous amounts of disk I/O in order to support a more flexible design; because disk latency is measured in milliseconds and schema changes are measure in hours (for VERY small changes) to months (for normal size changes). The "go down" schema will support an arbitrary number of expense types, which can be changed at runtime by defining a new type. The "go across" schema can support whatever existed when the application was written, meaning that any new or deleted expense type requires a person to find and modify every piece of affected code!

If you are using an older database engine that is expensive in terms of primitive DML operations (like INSERT, UPDATE, and DELETE), you sometimes had to denormalize to accommodate the deficiencies of the engine. Older versions of DB2 and Oracle were notorious for these kinds of problems, but current versions don't have much of a performance penalty.

A few engines also had problems with indexing, such that you had to keep row sizes small to improve performance. MS-SQL had this problem before the release of version 6.5, but for the most part this is a thing of the past too.

While there might be a small performance penalty in the execution of the SELECT, the cost in terms of application rewrites to support an additional expense type is staggering. For the folks that prefer to load their database into memory and write linked list managers to improve performance, the difference might be significant. To a business manager that wants to write code once then use it for months or years without a rewrite; the "go down" design is preferred hands down.

For More Information

  • The Best Database Design Web Links: tips, tutorials, scripts, and more.
  • Have an Oracle or SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
  • Ask your database design -- or help out your peers by answering them -- in our live discussion forums.
  • Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.

This was last published in May 2002

Dig Deeper on Oracle database design and architecture

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchHRSoftware

Close