BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Why is it important for organizations to prioritize their data, and how can Oracle shops do it if they don't yet have Oracle Database 12c with its Heat Map tool?
John Kanagaraj, database engineer, PayPal: Organizations today generate a lot of data. There is data coming in from sensors, from human beings, there is the Internet of Things. All this data goes into a humongous Oracle Database, and once it goes in there, people don't really track it, and users don’t really care. That is because today the thought is that disk is cheap, and just store everything that you need.
But this does present a problem for organizations because you cannot continue to keep data forever and forever. Another problem is that when put that data into the production database, when you want to make a test or stage database, you have to duplicate by as many copies as you want. You cannot back it up effectively and you cannot move it around that effectively, so organizations need a way to identify the data that is really useful, the data that is not really useful right now but that we need to keep around for compliance reasons, and then how to separate the two.
Oracle Database 12c Heat Map does that. It determines which part of the data is hot and which is cold. Then there is another tool called Automatic Data Optimization. It's a policy-building tool that allows you to compress the data that is not really needed right now but which needs to be kept around for compliance and other reasons.
But that is available only in 12c. We are in a situation where 12c was delayed last year and so we were not able to move to it. So we are looking at something in the intermediate to manage the data. The way we do that is, this information is already available. So if you dig into AWR, which is the Automatic Workload Repository, you will see a section that says "segment usage" or "segment statistics." The segment statistics say which table, which user, which partition, and which indexes are being used and what is the usage. Are there database writes to this? Are there database reads to this? All of this information is already available.
What I'm looking at is using this information from AWR determine the hot and the cold areas. Then you can use partitioning to split out this data by some kind of range partitioning, usually based on time. When you do that, you're able to take the cold data, the cold partitions and compress them and move them away to another tablespace or move them away off database and then keep the main database lean and mean.
We need to be able to do this. Storing data is expensive and storing data for a long time has become a problem, but we still need to keep this data for compliance reasons and other reasons.