Problem solve Get help with specific problems with your technologies, process and projects.

The data model for the data warehouse

In general, should the data model for the data warehouse
  1. be denormalized
  2. contain the subset of operational system data
  3. be similar to the operational system's model?

It is important to understand the multiple roles your data warehousing environment needs to play. A well-architected data warehouse will have an intake layer, supporting the input role. That is, to take data in from the various sources and place that data into what I and several of my colleagues to refer to as the Persistant Storage Area (PSA). The PSA is highly normalized, subject oriented, reflects the enterprise view of your data and contains an audit-trail level of historical detail. In some cases one or more subject areas within the PSA may reflect the operational systems models. This design decision is driven by the trade-off for pure enterprise-level abstraction vs. complexity of the ETL to get the source system into the PSA. So, PSA architectural layer of the data warehousing environment will be a normalized enterprise-wide view of your data and will have the potential to reflect your operational systems models on a case by case basis. The PSA will become the single source of record to feed information into all other architectural layers in the data warehousing environment.

Another critical role the data warehoue must support is ease of user access. User access will fall into two broad categories of behavior; 1) decision support & reporting and 2) exploratory/data mining. The decision support & reporting behavior is very well supported through the delivery of data via data marts that contain the information needed by the consumer to support their typical decision making and management information needs. Data marts can be designed using star schemas (highly denormalized), multidimensional cubes, and even hybrid denormalized/normalized structures that support things like list management. The exploratory/data mining behavior is well supported through the delivery of data via "profile" oriented designs that take the universe of data elements deemed needed to perform exploration and lay them out in "wide" profile oriented records. So the user access architectural layers of the data warehousing environment will be denormalized sets of information designed to meet the requirements of the primary consumer of that information. The single source for providing information into the user access layer is the PSA.

As your data warehouse grows, via incremental design and construction, it will mature from containing a subset of the enterprise data to an ever closer complete repository of all enterprise-wide data built upon the enterprise data model. Each data warehouse project or initiative should be designed with a succinct scope. The goal of each initiative is to build out and fill in data that represents 1 or more of the enterprise subject areas. Define each increment based on the business requirements for information access, work backwards to source the elementary data that must reside in the PSA in order to provide the data to fulfill those requirements. Whenever feasible use a triage approach when sourcing data into the PSA. For example: your first data warehouse project is to provide sales management with a set of sales performance metrics delivered through a sales performance dashboard. One of the key dashboard items will be the top 10 customers for the reporting period. They want to see the customer's name, their assigned customer segment, and their geographic location. Since we have to go to the source systems customer master table to obtain the customer name and place it into the PSA's customer table, we might as well map the rest of the business information columns from the source system's customer file to the PSA customer master table. With today's ETL tools, it will take your ETL architects essentially the same level of effort to map 20 columns from a single table as it would to map two. And you just anticipated future unknown needs. The PSA now contains data elements the sales managers haven't thought about yet, but as soon as they do, you will already have them in your data warhousing environment.

Dig Deeper on Oracle business intelligence and analytics

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.