2. Always choose your hardware platform first. The reason for this is the OS can in most cases be dedicated by the hardware manufacturer. Then determine which disk manufacturers are compatible with you selected server platform. Most will work through a SAN (storage area network) and therefore will always be compatible. But if the storage is to be connected directly to your server make sure that they are compatible.
3. Determine the amount of secondary storage that is required for each of the different layers of the warehouse and determine if your selected server manufacturer can attach to that amount of disk storage. If not, you need to consider another hardware vendor.
4. Determine the importance of disk speed that is required for each layer of the warehouse. Some layers can utilize slower disk arrays while other will support faster arrays. The closer the technology is to the end user the faster the technology needs to be. Make sure that the disk and server manufacturer are compatible for those specified disk amounts. If not, decide which manufacturer is to survive and document all reasons for changing. Choose a replacement vendor and repeat.
5. Once the server and disk manufacturers are decided, perform the following prior to committing to the purchase order.
a) How much room is available for growth in CPU and memory? Is the selected server configuration already maxed out? If it is expected that more then 70% of initial configuration is going to be utilized then the configuration should be reworked. If the CPU and memory can't be increased then consider a larger server.
b) How much disk space is going to be used during the first 3 months of production? Most initial estimates on disk space are 20-30% off. If the warehouse takes off, can additional disks be inserted into the disk farm? Consider selecting a array solution that is 30-50% utilized. Disk is cheap when inserted into an existing frame, but very expensive when bringing in additional frames.
c) What is the impact on the network when installing the additional servers and disk arrays? Consider additional network cards to handle moving data between servers.
6. What are the backup/restore requirements of your warehouse? Since the warehouse can have a huge disk farm the existing backup devices may be inadequate. Selecting the right recovery vendor depends on the size of the warehouse, the frequency of change for each layer, the volume of change at each layer, and desired speed of the backups. Also, don't forget the methodology of incremental backups in your selection. This can sometimes keep some of the smaller recovery vendors on your list.
NT vs. UNIX
No comparison when it comes to larger volumes. .5 terabyte warehouses run better on UNIX under an Oracle database engine. NT and SQL Server 2000 are raising their ceiling, but I wouldn't want to be known as the shop with the largest NT/SQL Server database. Just too risky when it comes to reliability.
BAG co-owner Mike Thornton contributed to this answer.
Dig Deeper on Oracle data warehousing
Related Q&A from Mike Lampa
When trying to design a data warehouse, we often try to model the database on the operational data model. Are there any guidelines in trying to ... Continue Reading
What is a surrogate key in a table? Continue Reading