Achieving Extreme Performance with Oracle Exadata, an Oracle Press book that came out in February, dissects the various features of the Oracle Exadata Database Machine. Rick Greenwald, one of the co-authors, talks in this Q&A about the mysterious aura around Exadata and what some of its most important features are.
You wrote in the book that Exadata has something of a mysterious aura about it. Why do you think that is?
Rick Greenwald: Oracle did something different with Exadata that we didn’t do with any of our other products: We brought it out and didn’t make documentation widely available. It’s difficult to get the documentation if you’re not running the machine.
And there is a reason for that. If you talk to the product managers, they say that it just works. And it does. Most of the secret sauce in Exadata is not really configurable. It’s not like other Oracle products where you can tweak it here and there. We built best practices into the boxes, and so we don’t really want people to change it.
What do you think is the single most game-changing technology in Exadata?
Greenwald: The biggest is essentially offloading to the storage servers. Many storage arrays have intelligence built into them. But because we wrote software for both the database and storage servers, we’re able to offload database-aware information to storage. What that does is reduce the amount of data sent back to the database node. Then there is less work for the database server to do, and less traffic going between the database and storage.
Server appliances have been around for decades, so how is Exadata different?
Greenwald: There are three pillars that produce the power of Exadata. The first is optimal hardware, even if it is commodity hardware. So we have the latest multi-core CPUs. We have flash memory. We have InfiniBand.
The second thing, as I mentioned before, is proper best practices. It could be done everywhere, but it’s not done that often. There are all sorts of reasons why you don’t configure things properly yourself. Maybe you didn’t know that you needed multiple disk controllers. Maybe when storage needed to be expanded, you just added disks and didn’t expand the rest of the bandwidth path. I don’t think a lot of people start with bad configurations, but a lot of people end up with them.
The third thing is the software, and this is something no one has. Offloading processing to the storage servers is a significant piece of software. All of our software is database aware. If we can cut down the amount of data that is sent to the database server, we increase the effective network bandwidth and improve processing time. All of these things have effects down the line.
Can you talk more about the work Exadata does at the storage level?
Greenwald: One thing it does is offload predicate analysis, such as a WHERE statement, to the storage servers. So any rows that don’t match those conditions aren’t sent back to the database server.
It also does join filtering with bloom filters. The bloom filters store values in a more compressed fashion and will never return a false negative. In other words, using a bloom filter may send back rows that are not used in a join, but it won’t eliminate a row that is needed for a join. That ensures data integrity.
Another feature is storage indexing. A storage index tracks the high and low values for columns in 1-megabyte sections of data. Exadata can use the values in the storage indexes to eliminate sections of data to retrieve. With the storage index, it doesn’t get the data it knows is not going to be there. Why even get it?
Finally, another important feature is something called the Input/Output (I/O) Resource Manager. We’ve had Database Resource Manager where you can set up different groups of users with different rules, and that works for CPU. I/O Resource Manager works that way for I/O bandwidth. You can have two different databases, test and production, and you can say that you want production to get 90% of the bandwidth.