Data integration is a difficult proposition, but help is coming in the form of a relatively new approach to information management called data virtualization, according to one expert.
The typical enterprise today runs multiple types of database management systems (DBMSs), such as Oracle and SQL Server, which weren't designed to play well together, according to Noel Yuhanna, an IT analyst with Cambridge, Mass.-based Forrester Research Inc. What's more, he adds, these systems are increasingly being used to store nontraditional types of information such as unstructured and semi-structured data.
Combine those issues with the fact that -- as a result of the proliferation of data-retention regulations like the Sarbanes-Oxley Act -- enterprises today are seeing an unprecedented rise in the amount of data they store, and you've got one heck of a data integration challenge on your hands.
"Data integration is getting harder all the time, and we believe [one of the causes] of that is that data volumes are continuing to grow," Yuhanna said. "[But] you really need data integration because it represents value to the business, to the consumers and to the partners. They want quality data to be able to make better business decisions."
Data virtualization, also referred to as Information-as-a-Service and Data-as-a-Service, promises to ease the impediments to data integration by decoupling data from applications and storing it on the middleware layer.
Data virtualization can essentially be thought of as a service-oriented architecture (SOA) for data, according to Yuhanna. But where the traditional SOA approach has focused on business processes, data virtualization focuses on the information that those business processes use.
"Data-as-a-Service is something which is growing in terms of importance because if you have 10,000 databases, where is the single truth lying in those 10,000 databases?" Yuhanna said. "Does the application know? The answer is probably no. So you want to go to the single source of truth. If the applications interact with one virtualized layer, then that would make sense because you have consistent data -- you have quality data made available for the application."
But getting to the point where applications interact seamlessly with virtualized data on the middleware layer isn't easy, and it requires that companies pay close attention to data quality and application performance.
With multiple DBMSs running in the typical enterprise, information is very often duplicated through replication and extract, transform and load (ETL) operations, Yuhanna explained. Therefore, getting the correct, or truthful, data onto the middleware layer means that functionality which ensures data quality needs to be built into the middleware layer as well.
"Data quality is a big factor because you want to have consistent values," he said. "Traditionally, data quality was more of an offline approach, or built into the application. But now we are looking at scenarios where this is coming into the middleware. The middleware is going to be doing the quality analysis."
Decoupling data from applications and storing it in the middleware layer also presents serious concerns about application performance, Yuhanna said. That's why Forrester believes that in-memory data-caching software, which minimizes disk I/O, will become increasingly important.
"In the next coming years, the first layer of access for most of the applications will be the cache," he said. "It's just like Google. Google caches a lot of data, and your Google application today runs faster than a lot of enterprise applications."
Getting started on data virtualization
According to Forrester, none of today's software vendors offers a complete data virtualization package, but firms like Oracle, Microsoft, IBM, BEA and Red Hat's MetaMatrix division are making significant headway toward that end.
Companies interested in launching a data virtualization initiative need to understand that it could take years to complete and that it's probably best to proceed gradually, Yuhanna said.
"I think the way to approach it would be to first look at the most common data that is shared. It could be addresses, telephone numbers, or it could be some financial data," he said. "Look at those common points of shared data and bring them to the virtualized view so that they are consistent and they are able to share with the applications. Then you can add on additional values or data as you grow."