Problem solve Get help with specific problems with your technologies, process and projects.

Four methods for designing a data warehouse/data mart environment

There are (at least) four different ways to design your data warehouse/data mart environment. Learn how to choose your design and what to consider when designing your data warehouse in this tip.

There are (at least) four different ways to design your data warehouse/data mart environment. Which one you chose to use should not really make a difference. Having stated that, I expect a lot of hate mail. That's fine. Go ahead. But wait until after you have read the rest of the story.

The four ways to design your data warehouse/data mart (DW/DM):

  • Normalized Data Structures
  • Multi-Dimensional Database Structures (cubes, etc.)
  • Star Schemas (a group of stars are constellations)
  • Snowflake Schemas (a group of snowflakes make a snow storm)

While there may be other terms out in the marketplace, they are not really the strong players and are most likely a combination of the ones above.

When designing your DW/DM, you need to take into account many things:

 

  1. direct access to the data by end-users
  2. the purpose of your DW/DM
  3. the end-user tools that will be supported

Normalized Data Structures

This is a point of strong discussion between my good friend Ray Ferrara at KPMG and myself. We have had many length discussions over our favorite beverages (Coke for me!). Ray believes that normalized data structures are the only way to build the data warehouse. The data warehouse feeds the data mart into whatever type of environment is best for the end-user: multi-dimensional database, snowflake, or star. He believes in the true Wholesale/Retail data warehousing environment. In this environment, the end-user will not touch the data warehouse directly, much like we generally cannot purchase directly from a wholesaler. The data mart is where the end-user reads the data. I believe (and have implemented) many data warehouses where the designs are star or snowflake schemas (although I tend to lean toward the star!). I believe it will be difficult to provide every user with the data set the want to do analysis on. But this is a very common practice.

Star Schemas

Ralph Kimball is attributed to working with the team the created the concept of the star schema. He certainly is the person fully involved in the promotion of this concept. The basis of the star is that you have dimensions (character descriptions that end-users will like to look at measurements of the company by) and Facts (the actual measurements). Examples of dimensions are customer information (name, address, gender, postal code), product information (name, vendor, category, subcategory) and point of sale information (store, address, salesperson). Examples of facts are quantity sold, unit price, discount applied, tax amount, and total amount. The intersection of the dimensions is where the fact is. In fact (no pun intended), a star is a flat representation of a multi-dimensional cube.

This is popular because the performance is good (not as many joins as a normalized data structure and DBMSs are not yet good a dealing with 5+ way joins!) and is easily understood by our business users. You can, with most query tools, define the semantic layer to look like a star schema if you are using a normalized data structure, but you will still have to do lots more joins to get the data. The biggest negative against the star schema is that you have so much duplicate data. I, like Ralph, believe this is a non-issue, since dimensions, where most duplicate data arises, are smaller than fact tables, and thus take up minimal disk requirements as compared to the whole data warehouse.

Snowflake Schemas

I have often said that the snowflake schema is a star schema with normalized dimensions. Mostly I think snowflakes make data modelers feel better since it doesn't have duplicate data in the dimensions. The biggest problem that I have with the snowflake schema is that it will require more joins to provide the results to the end-users. As I said earlier, 5+ way joins are very difficult for most DBMS optimizers to deal with, so having fewer joins are will provide better performance.

Multi-Dimensional Databases

There is a group of products that store data in a proprietary format that provides extremely quick access to data and its functions. They represent the data as a cube. The good thing about multi-dimensional databases is they are fast for drill down, drill up and drill across. The negative centers on what happens when the dimensions change - do I have to rebuild the whole cube? Other questions center on whether multi-dimensional databases are able to hold all the data required for your DW/DM environment. Vendors say yes, but I am not yet convinced.

Conclusions

I think the most important part of defining the data warehouse/data mart environment is that you design the data that best fits the tool your end-users are using. If your tool works best in a snowflake schema, build them a snowflake. If you are never going to let the end-users access the data warehouse and you feel more confident in your abilities of building a normalized data structure, then build that. However, don't discount the star schema. It has proven itself in many data warehouse/data mart environments. I believe that the star schema is the best general-purpose design for the data warehouse/data mart environment.

 

 

About the Author

Chuck Kelley is president and founder of Excellence In Data, Inc. and an internationally known expert in database technology. He has more than 20 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 35 data warehouses and data marts. He also teaches seminars, co-authored a book with W. H. Inmon on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. Please feel free to email him at chuckkelley@usa.net with comments (negative or positive) about this column or ideas for future columns.

 

For More Information


 

Dig Deeper on Oracle data warehousing

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchHRSoftware

Close