Exadata was initially released at the 2008 Oracle OpenWorld conference. Five years and thousands of implementations later, I still rarely see the full potential of the Exadata architecture unlocked. Companies have long mismanaged and misused Oracle on non-Exadata database appliances, and now the same is occurring on Exadata.
Despite being underutilized, the Exadata architecture offers many benefits. Most implementations get a significant boost in performance from the hardware refresh and the reference architecture. Removing the database data from the enterprise SAN often results in a huge improvement. Having a dedicated set of disks, with a very large SSD cache in front, removes a large percentage of the common bottlenecks in database disk IO operations.
However, organizations rarely take the extra steps to fully leverage the capabilities of the platform. Exadata can be a game changer when it comes to handling both large bulk operations and millisecond-range response time on the same data. But the crucial phrase in that last sentence is on the same data.
A key technology exclusive to the Exadata architecture is the very advanced compression feature called Hybrid Columnar Compression (HCC). This technology is not to be mistaken with columnar storage. Although it is very similar in nature, it doesn't share some of the challenges associated with true columnar storage. Oracle's HCC compression employs multilevel compression with different algorithms applied to each column in order to maximize compression ratios. A typical compression ratio is 10:1, but with highly repetitive data a 50:1 ratio is also possible, as is 80:1 in some extreme cases.
Let me explain with an example from a recent implementation for which I was part of the data model and ETL design committee. We were faced with the challenge of converting a relatively unstructured data set of about 2 TB into a structured, well-defined data set. The goal was to convert the user-facing application from daily reports sent over email, to real-time, subsecond response time with drill-down capabilities. By using the well-known star schema design, the resulting fact table was 17 billion rows in size, yet it took only about 400 GB of space. Due to our ability to fully leverage Oracle's advanced HCC, we reached an average ratio of 14:1 compression. Without it, that table would have been 5,600 GB, or 5.6 TB. By designing our ETL to be HCC friendly, we were able to maximize its benefits and process more than 100 million records per day with near real-time delays.
And it's more than just disk storage. It's also about cache space. By having the table 14 times smaller, we were able to make both the SSD and RAM cache layers 14 times more effective. And by using the smart scan technology that is part of the Exadata architecture, we were able to leverage very high scan speeds that allowed bulk reports to be produced within reasonable time frames.
Exadata's ability to present the same data for both high-speed bulk processing and millisecond-precise record lookups via a variety of index choices is a unique and valuable feature.
I've seen countless companies struggle to solve this same problem, often by attempting to use a variety of open-source products, such as Hadoop, HBase, Hive, MongoDB, and CouchDB. The SQL-like capabilities in open source platforms are desirable because internal staffs are often familiar with them. They usually achieve the goals, but only with extensive research and development.
In almost all cases, the same problems could have been easily solved by fully leveraging Oracle's Exadata architecture. So what's holding organizations back? The cost of the platform and licensing is often given as one of the reasons. However, the cost of all the research and development time leveraging the alternative options should not be overlooked or underestimated. Projects that can be completed in less than a few months on Oracle's database can take 12 to 24 months on other platforms.
Another reason organizations shy away from Exadata is that they have had negative experiences with relational databases. Sometimes just one bad experience can cause an organization to shut the door on relational databases for good and seek other possibilities. By ignoring all the benefits and the arsenal of features a relational database offers, however, capabilities like backups, replication, automatic summary tables, extensive indexing and more are often missed.
To help companies recognize the benefits they might be missing, I'm beginning to work on a new series that focuses on the various types of problems that organizations face when they steer off the tracks into non-RDBMS-type products. You can follow this series on the Pythian blog.
About the author:
Christo Kutrovsky is a senior consultant at Pythian, a global data management consulting company that specializes in planning, deploying and managing mission-critical data infrastructures. He is also an Oracle ACE with a deep understanding of databases, application memory and input/output interactions. He has delivered presentations at the Independent Oracle Users Group conference, Oracle Open World and other industry conferences.