Q
Problem solve Get help with specific problems with your technologies, process and projects.

Table design: go across or go down, continued

In reference to Pat Phelan's answer to the question of whether to "go across or go down": If you understand normalization, then you know that Mr. Phelan is completely wrong.

First normal form simply means ensuring that each attribute represents an atomic, or non-decomposable value. Both the "go across" and "go down" versions of his monthly expenses example meet this requirement.

2nd normal form means satisfying 1NF and ensuring that each attribute depends entirely on the key. Given that the key for "go down" would be the month and the expense type, and the key for the "go across" would be the month, then both versions meet this requirement.

3rd normal form means satisfying 2NF and ensuring that no attribute represents a fact about another attribute. This is where the "go down" version fails. With one column representing the expense type and another column representing the expense, the expense type column represents a fact about the expense column. In other words, they are not mutual independent.

Aside from this discussion of normalization, the "go down" version is simply often a worse option, for several reasons:

1) To store a set of expenses for a month, you have to do an insert for every type of expense. This is a comparitively expensive set of operations (storing the expenses for a month with the "go across" version only requires one insert).

2) Querying is incredibly more expensive with the "go down" version. Here's the most clarifying example. Assume the following tables:

```name:go_down
month expense_type amount
----- ----------- -------
jan   salary          500
jan   rent            200
feb   salary          300
feb   rent            200

name:go_across
month salary rent
----- ------ ----
jan      500  200
feb      300  200
```
Now, as a manager, I ask: in what months was salary greater than 400 and rent greater than 180? Here are the SQL select statement for the two tables:
```go_down:
select month from go_down
where expense_type = "salary"
and amount > 400
intersect select month from go_down
where expense_type = "rent"
and amount > 180

go_across:
select month from go_across
where salary > 400 and rent > 180
```
As you can see, the query for go_across is significantly more efficient than the query for go_down.

Mr. Phelan says an advantage of the go_down version is how easy it is to construct a query to get the total expenses for the month. He's right - it is easy to construct the query. Here it is:

```select sum(amount) from go_down where month = "jan"
```
whereas the query for the go_across example is:
```select salary + rent from go_across where month = "jan"
```
It is a bit easier to construct the query for the go_down version. BUT it is more efficient to run the query using the go_across version! With the go_down version, the RDBMS has to do more disk reads to get the answer.

Lastly, note that I put a caveat on my statement above in saying that the "go down" version is USUALLY worse than the "go across" version. Specifically, the "go down" version is worse than the "go across" version when you have a limited number of "types" that can be specified beforehand. Mr. Phelan mentions MS Money and Quicken, saying that they are normalized, and they use the "go down" method. Yes, MS Money and Quicken use the "go down" method, but we now see that they are denormalized. But in this case the "go down" method is the best option. And it's for the simple fact that the developers of MS Money and Quicken do not know what expense types users will define and want to use when entering transactions. Every database designer should be cognizant of these pluses and minuses when they design their databases.

I made this mistake in a budget application that I designed the database structure for in roughly 1986, and I cursed the design until we retired the application in 1996! This is a subtle kind of error that often sneaks up on the inexperienced data modeler, or one that has never taken a significant application from "cradle to grave" through the life cycle. Silly me, I assumed that years always had twelve months!

While it seems like you can make a choice to normalize a table in many different ways, and still reach a manageable form, that is rarely if ever the case. The "across" form of the database violates the first normal form as described at http://databases.about.com/library/weekly/aa081901a.htm?terms=normalization because it has more than one kind of expense going across the rows. Naming one column "salary" and one "rent" doesn't make them any different than "expense1" and "expense2" from the standpoint of normalization.

The fundamental issue from my perspective isn't even the normalization. Hardware is cheap. Wetware (people) are expensive! You can afford enormous amounts of disk I/O in order to support a more flexible design; because disk latency is measured in milliseconds and schema changes are measure in hours (for VERY small changes) to months (for normal size changes). The "go down" schema will support an arbitrary number of expense types, which can be changed at runtime by defining a new type. The "go across" schema can support whatever existed when the application was written, meaning that any new or deleted expense type requires a person to find and modify every piece of affected code!

If you are using an older database engine that is expensive in terms of primitive DML operations (like INSERT, UPDATE, and DELETE), you sometimes had to denormalize to accommodate the deficiencies of the engine. Older versions of DB2 and Oracle were notorious for these kinds of problems, but current versions don't have much of a performance penalty.

A few engines also had problems with indexing, such that you had to keep row sizes small to improve performance. MS-SQL had this problem before the release of version 6.5, but for the most part this is a thing of the past too.

While there might be a small performance penalty in the execution of the SELECT, the cost in terms of application rewrites to support an additional expense type is staggering. For the folks that prefer to load their database into memory and write linked list managers to improve performance, the difference might be significant. To a business manager that wants to write code once then use it for months or years without a rewrite; the "go down" design is preferred hands down.

• The Best Database Design Web Links: tips, tutorials, scripts, and more.
• Have an Oracle or SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
• Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.

Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

SearchDataManagement

• Where InfluxDB time series database is going

Users need more than SQL for querying databases, according to Paul Dix, co-founder and CTO of InfluxData. That's why the vendor ...

• MariaDB X4 brings smart transactions to open source database

Open source database vendor MariaDB updates its flagship platform with new features to enable the convergence of transactional ...

• How data lineage tools boost data governance policies

Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so ...

• Storytelling using data makes information easy to digest

In a Q&A, Nate Nichols and Anna Schena Walsh of AI-based analytics vendor Narrative Science talk about how data storytelling can ...

• Data-driven storytelling makes data accessible

In a Q&A, Nate Nichols and Anna Schena Walsh of data storytelling vendor Narrative Science discuss their book on storytelling and...

• Startup Uplevel targets software engineering efficiency

Startup vendor Uplevel emerged from stealth on Wednesday touting a platform aimed at improving the efficiency of software ...

SearchSAP

• SAP S/4HANA migration: Critical advice for moving off ECC

With the end of SAP ECC support looming in 2025, organizations must make some tough decisions. Here's a look at your choices.

• New SAP leadership faces big challenges in 2020

Industry analysts discuss SAP's biggest issues in 2020, including how the two new CEOs will guide the company deeper into the ...

• SAP Data Hub opens predictive possibilities at Paul Hartmann

When medical supply firm Paul Hartmann AG tested a supply chain analysis system built on SAP Data Hub, it found that it could ...

SearchSQLServer

• SQL Server database design best practices and tips for DBAs

Good database design is a must to meet processing needs in SQL Server systems. In a webinar, consultant Koen Verbeeck offered ...

• SQL Server in Azure database choices and what they offer users

SQL Server databases can be moved to the Azure cloud in several different ways. Here's what you'll get from each of the options ...

• Using a LEFT OUTER JOIN vs. RIGHT OUTER JOIN in SQL

In this book excerpt, you'll learn LEFT OUTER JOIN vs. RIGHT OUTER JOIN techniques and find various examples for creating SQL ...

TheServerSide.com

• Don't ever put a non-Java LTS release into production

Development teams should avoid non-long-term support releases at all costs. Pay attention to the Java release cycle to make sure ...

• Public API strategy considerations for enterprise adoption

As organizations look for more cost-effective ways to manage data, an evolving landscape with APIs has made the technology more ...

• Ideas on how to hold a successful code hackathon

Want to host a hackathon? Here are some ideas on what a company can do to host an event that solves problems and reenergizes the ...

SearchDataCenter

• Organizations try to predict the effect of 5G infrastructure

With more 5G and IoT devices emerging, admins must prepare their data centers to support low-latency apps and edge computing with...

• Top infrastructure and operations technology myths of 2019

Admins are consistently evaluating technology to improve I&O efficiency. Cost, integration and business goals are key components ...

• Improve efficiency with server energy consumption tools

Deciding what servers are most efficient for your infrastructure requires research. Hardware-level certifications and web-based ...

SearchContentManagement

• 4 popular content collaboration platforms to consider

Companies need to be organized if they want to be efficient. Content collaboration platforms are useful, but first, ensure that ...

• AI can enhance content security with a bit of planning

Microsoft and Box both use AI technologies to keep content secure in the cloud. But before using such tools, businesses first ...

• Ex-SAP exec steers Episerver CMS toward digital experience market

Alex Atzberger discusses leaving the helm of SAP's CX platform to become Episerver CEO. Now, Episerver looks to reinvent itself ...

SearchHRSoftware

• Why mobile recruiting is the future

Recruiters can use text recruiting to connect with great candidates. Here's a look at how mobile recruiting works, why it's ...

• Top 7 HR trends for 2020 and beyond: The change decade

Because HR requires a balance between the needs of the organization with the needs of employees, it's always been a balancing act...

• For insider threat programs, HR should provide checks and balances

Insider threat programs may backfire if employees feel they are intrusive and violate privacy, Forrester Research warns. Making ...

Close