I work for a university and we have built a data warehouse and two data marts for archived data. Currently we are using an export/import process to clone our online system to our test data warehouse.
Here is my system information: My database is Oracle 9i and operating system is Unix on Sun Solaris. We will go to an ODS in 2007.
We are trying to determine if we should do nightly incremental updates/inserts instead of cloning the production system. I know that incremental table updates are faster, but are there any cons to the process? Is there a possibility that the files could get out of sync?
- Do you use an export/import process to refresh your data warehouse?
- Do you do daily incremental updates?
- How long does it take to refresh your data warehouse/data marts?
- Do you create any summary data or materialized views?
- How long does it take to do backups?
- What is your enrollment?
- What about growth rates? How does that impact your system?
- What platform are you running on?
Thank you in advance for your time.
Wow, this is a long question that touches on a number of aspects of the types of architectural decisions that we normally consider when constructing or maintaining a data warehouse, as well as your basic question on whether to use export/import as a cloning method.
Let's start with import and export as a cloning method. First off I can tell you that it works; this is a solution that will work for what you are trying to do. However, not knowing how large your warehouse is, this may not be a viable option. Consider a warehouse that is 1 or 2 terabytes; the selection of export/import would not be viable. Options that we have used depend on the requirements. For example, for an enterprise data warehouse, we use a number of system-level options. We use either Oracle's transportable tablespaces or we use an EMC feature to quick-copy disks. Both of these are effective solutions, since they minimize the impact to the database, but these may be more appropriate to a larger warehouse. Another option is replication, but it has not been one that I generally would use for a test database. As you discuss you are concerned about posting updates; I can understand this as you will now need to build an ETL structure just to update your testing database. Sometimes I think we complicate our lives unnecessarily.
As far as the type of data warehouses I work on, the current one is an enterprise warehouse that integrates multiple lines of business. The warehouse is currently multiple terabytes and has seen significant growth in the past while it resides on an HP 12-way Itanium with an EMC array. We perform all of our loading via Informatica, so we utilize direct database access as well as bulk load methods, depending on the transactions that we are processing. All of our processing is delta processing. We capture changes in the sources and then post them to the warehouse; our load window is now at about five to eight hours depending on the daily volumes. As for backups, I do not have a time on this as I leave that to operations. As far as refreshing our development and test databases, we used export/import the first time and that took us three days. We have since moved to a simple Informatica solution that completes in about 12 hours.
So I hope that helps you. If you have any follow-up questions please let me know as I am happy to help. Good luck.
Dig Deeper on Oracle data warehousing
Related Q&A from Ian Abramson
What are the pros/cons of custom development vs. an ETL tool? The question of custom code versus the use an ETL tool is one that we are faced with ... Continue Reading
ETL tools provide very robust support for complex business logic. Continue Reading
I am evaluating ETL tools for the organization. Considering that I am looking at three or four tools, how do I rate these and put a weight-age for ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.