Problem solve Get help with specific problems with your technologies, process and projects.

Retaining constraints on the fact table

Following up on your answer to whether to put constraints (primary and foreign keys) on the fact table, here's something I'm quoting from Oracle8 Data Warehousing by Gary Dodge and Tim Gorman:

Page 404: "Many people new to data warehousing come from the world of 'operational' system. Inevitably, people who are used to creating data models for the 'operational' system consider the use of integrity constraints such as primary key, unique key and foreign key to be mandatory for all relational databases.

"But, this is not true for the data warehouses..."

"Data warehouses are not tasked with maintaining referential integrity. They exist to provide a reporting base for the data as it exists. If there is a referential integrity error on the source operational system, then this same error should be propagated to the data warehouse..."

Please share your response on retaining contraints on the fact table.
I believe that everyone is entitled to their opinion, it makes for a great discussion. Mr. Dodge and Mr Gorman are both extremely knowledgeable members of the data warehouse community and I truly respect their opinion of using RI in a data warehouse. I can understand their reason for doing it, however, I believe otherwise.

Referential integrity is like insurance. If everything goes well there was no reason to have it, but if things go wrong it may become a problem. RI provides us with data insurance. When loading a data warehouse I believe you have two choices. The first is to load data completely and the other is to load it accurately.

Tim and Gary have chosen complete loads. This option is good if you require complete data at the cost of reduced accuracy. When data lacks referential integrity, you risk reducing the value of data's dimensionality, by not linking your facts and dimensions. It will become difficult to drill into your information due to the existence of rogue data. You will find data that lacks integrity becomes lost in the reporting shuffle. However, if all you want to know is your total sales for the month, things are fine, but if you want to know the exact products, this could be a challenge, due to this data or lack of it.

The other choice is choosing accuracy; here you are choosing to achieve two distinct advantages. The first surrounds solid reporting, with no risk or erroneous or misleading reporting. Your data will tell the truth each time you look at it regardless of the grain of reporting or the timeframe. With RI, you ensure your data can be related without the risk of rogue data. Facts will always link to dimensions.

The other advantage is enabling business process improvements. It is my opinion that a data warehouse not only provides a great tactical solution for a business from a knowledge management perspective, but it also identifies problems with operational systems and processes. It is my experience that a warehouse can identify these problems and help to improve and quantify the improvements to operational data systems. One client where I worked found customer and billing records that had data issues in 25% of their records. The warehouse identified the problems with the processes within the business and enabled the fixes. Within three months the data and the processes where improved and then only 2% of records continued to have problems. Another example was a sudden appearance of duplicate records, which were captured by RI failures. It turned out that a bug with a new operational module. The problem was fixed within one day, all due to RI.

From a technical perspective, I realize that there is a cost in the area of performance, but the cost to your data and your business may prove more costly than the lost performance. Also increased disk space to accomodate more indexes also has its own associated costs. Insurance is not free.

I think the choice is clear. The ultimate choice is yours. I choose the safer route that may never show rewards, it may be due to my DBA background, which tells me to play it safe and cover all my bases. Data is an organization's most valuable asset, I want it to be right.

Dig Deeper on Oracle data warehousing

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.