The ability of organizations to apply business intelligence (BI)--to make strategic decisions based on information available in operating and accounting systems--has become a necessity to compete in today's business environment. But gathering the data needed to apply business intelligence puts a great strain on an organization's already existing database and data warehousing systems. For example, BI-driven applications such as customer relationship management (CRM) require repeated analysis of detailed transaction histories on perhaps millions of customers. Queries by financial analysts, marketing researchers, strategic planners and the like can increase processing loads to levels that can drive even the most robust database to its knees.
At the same time the need for BI is increasing the quantity of data that must be massaged and analyzed, the amount of time that businesses can wait to receive meaningful information is decreasing. Several years ago, an analyst entering a query into the data warehouse would expect an overnight response. Today, response time must be almost instantaneous to be meaningful. Business intelligence that arrives too late is no longer useful.
Several Approaches
There are several approaches that companies can undertake in order to improve database performance to support their business intelligence needs. One popular approach is database tuning: attempting to optimize performance by "tuning" elements such as database table structures or the sequences in which query instructions are executed. It's possible that database tuning can improve performance by 20-200%, but most often tuning provides more modest gains. Also, database tuning is expensive, requiring extremely skilled database experts.
Other approaches to improving database performance can be divided into three general categories: strategic, tactical, and operational. According to Bill Inmon, the greatest performance gains can be achieved by using a strategic approach, with a 200% to 20,000% performance gain possible. Tactical approaches can improve performance by 20%-200%. Finally, operating performance gains are in the neighborhood of 5%-20%.
While the strategic approach can provide the biggest jump in performance, it also requires the biggest change to an organization's database infrastructure. For example, the strategic approach includes such costly activities such as upgrading the hardware platform to get the processing horsepower needed, creating data marts or exploration warehouses to off-load some of the processing from the main database, or redesigning the structure of the database or data warehouse.
Companies most often apply a tactical approach to solve problems such as slow query response time or other end-user performance problems. Tactical approaches include indexing, creating summary tables, or merging tables, creating a project warehouse, and sampling. Although the cost of tactical approaches tends to be less than those of the strategic approaches, the disadvantages to tactical approaches are that these solutions are often complex, require on-going maintenance, and are not easily retrofitted into an organization's existing database environment.
Operational performance approaches focus on tweaking the database system by using techniques such as repeated query detection, basic indexing and data warehouse purging. Although performance gains are less than 20%, the operational approach requires the least cost and amount of tinkering with the databases.
What's Needed: The Case for Aggregation
As described earlier, some of the approaches to improving database performance are expensive, others are hard to maintain, complex to implement, and require far-reaching changes to the end user environment. And, most of these approaches provide only marginal improvements.
To adequately support business intelligence needs, companies need a way to improve database performance without high costs. To keep costs down, the performance solution must be compatible with and not impact the organization's existing databases, data applications, or user operations. The solution must also be highly scalable to support growth with little additional cost.
In addition, the solution must improve performance dramatically. In very large databases, the factor most affecting performance is the data aggregation function. According to noted data warehouse expert Ralph Kimball, providing a proper set of aggregates can increase database speed by a factor of 1000. Improved data aggregation can deliver the performance enhancements organizations require.
The HyperRoll Solution
A new high speed, high performance data aggregation and caching engine from HyperRoll increases performance dramatically. It is a stand-alone software based product that can be incorporated transparently into almost any database or data warehouse environment. HyperRoll is designed around a proprietary, high-performance aggregation algorithm and a breakthrough search and storage methodology which optimizes aggregation and query performance for almost any database application, without the usual high cost. One of the major strengths of HyperRoll in these uncertain economic times is that it maximizes current investment in systems and applications.
HyperRoll consists of three layers: interface, connectivity, and data. The interface layer is the front-end that connects the aggregation and caching engine with the existing database server, OLAP server, reporting tools and/or third party applications. The connectivity layer provides the data access method. The data layer consists of two sub-layers: HyperRoll engine and storage.
The high-performance engine handles client's requests and retrieves the appropriate data from the cache, which can contain pre-calculated data. The data layer contains logic that decides whether to use pre-calculated data or perform on-demand calculation to provide optimal response time.
Customers have recorded orders of magnitude performance improvement in aggregation and query operations after installing this solution. A five-million record database in a telecom firm that had originally taken eight hours for data to fully aggregate took less than nine minutes after HyperRoll was integrated into the environment. A 250,000 record database in a manufacturing application, which had taken 7 hours for data aggregation, took only four minutes using this new solution. Best of all, the system provides superior performance in all analytical functions. Right away, organizations gain measurably faster aggregation and query response times.
Transparent implementation means the system can be installed with no impact to existing database, data, applications or user operations. It is compatible with most relational and multidimensional database systems and applications and supports common UNIX platforms as well as Microsoft Windows NT/2000.
HyperRoll is highly scalable so performance is ensured as needs change. It is highly effective in improving performance in all size of database or data warehouse systems even the very large data warehouse (VLDW). Parallel aggregation and analysis tasks can be performed on single and multiprocessor machines or in clustered networks. Large data sets can be subdivided among processors, allowing linear scalability and much faster data aggregation. Data can be partitioned on different servers or disks.
In the relational database environment, users see the HyperRoll solution as a standard database view. Multidimensional database users see the data as a standard data cube in the database. Users have the flexibility of performing full or partial aggregation tasks based on specified parameters. HyperRoll functionality includes performing aggregation operations such as sum, count average, performing analytical operations, or "what if" (virtual cube) analysis.
Summary
With the flood of information and the need to analyze it causing major choke points in today's information systems, organizations are looking for ways of dramatically improving database/data warehouse performance. Cost-effective solutions that provide a superior option to tuning, hardware upgrades and other approaches will clearly emerge as the tools of choice for embattled database industry professionals. Technology such as HyperRoll's that can deliver the break-through in performance improvements will provide the key to coping with today's information onslaught.
About the Author
Reuven Bakalash, Ph.D. is the Chief Technology Officer at HyperRoll, Inc.