When moving data from a database to a warehouse, the data transformation phase is followed by the data loading process. This is a seemingly straightforward process, but there are some things to look at before making the move according to Ralph Kimball's book The Data Warehouse Lifecycle Toolkit (Wiley Computer Publishing):
The capabilities you need during the data loading process are, in large part, a function of the target platform. Some of these capabilities are:
1. Support for multiple targets. The atomic data mart may be on one DBMS, and the business process data marts may be on another. Each target will probably have its own syntax and idiosyncrasies, and your load process should know about these differences and use or avoid them as appropriate.
2. Load optimization. Most DBMSs have a bulk loading capability that includes a range of features and can be scripted or invoked by your data staging tool through API. Every database product has a set of techniques and tricks that optimize its load performance. These include steps like avoiding logging during loads and taking advantage of bulk loader capabilities like creating indexes and aggregates during the load.
3. Entire load process support. The loading services also need to support requirements before and after the actual load, like dropping support requirements before and after the actual load, like dropping and re-creating indexes and physical partitioning of tables and indexes.
For More Information
- The Data Warehouse Lifecycle Toolkit can be purchased here.
- The Best ETL Links: tips, tutorials, and more.
- NEW: Have an ETL tip to offer your fellow DW gurus? The best tips submitted will receive a FREE Toshiba DVD Player! Submit your tip today!
- Ask your technical ETL questions--or help out your peers by answering them--in our live discussion forums.