With data warehousing software, one of the most common constraints is the time window available for batch extract processing on the source systems. The resource-intensive extract process typically has to be done in off-business hours and restricts access to critical source systems.
Low-impact, real-time data integration software can liberate your systems from batch windows. When the extract component uses a nonintrusive method, such as reading database transaction logs to capture only the changed data, it does not place a burden on source systems. Hence, data extract can happen at any time of the day and throughout the day, all while users are online.
When that extraction occurs in real time, the data can bring exceptional value to the business, although it does change how elements in the data collection process are tweaked to support the nature of real-time data. What’s more, that data has to be effectively protected, and it’s difficult to apply disaster recovery and backup techniques to data that is constantly in motion.
But the very technology that can bring real-time data integration to data warehouses can also be used to further protect that data. After all, technology that moves data in real time also interacts with data in real time, creating an entry point for data protection technologies. Yet the speed and efficiency of data in motion may be affected by latency introduced during the protection process.
That means one of the first considerations when moving to an active data collection scheme that integrates with a data warehouse should be the flow of the data across IT systems and the latency that may be introduced. In other words, real-time data integration requires an understanding of data in motion and the components that enhance or hinder that movement.
Obviously, enterprises want to protect their data. Nevertheless, as the appetite for data volumes grows, storage technology becomes a critical business asset on which business continuity relies. And as real-time data analysis becomes part of a line-of-business process, it also falls under the realm of continuity. The most basic approach to provide data safety and continuity is hardware or software replication that automatically maintains a secondary copy of critical data. Backup methods built in-house and based on open source software are also not unheard of.
Enterprises are investing in five critical areas related to data management: disaster recovery, high availability, backup, data processing performance and migration to more advanced databases. That sets the stage for IT to pursue advanced technology, such as real-time data integration and its associated infrastructure elements. Also, those strategic investments can provide the budgetary resources to accelerate the adoption of real-time technologies while improving the return on investment and justifying the business case proposed for a real-time data integration project.
It is critical to map those investment areas to in-kind elements of a real-time data integration system, however, and that takes an intimate understanding of the components that make up that system and how those components are driven by organizational data requirements. These include the following:
- Data volume (size of data and number of updates)
- Date movement frequency
- Transformation requirements
- Outage windows and business continuity
It is those elements that will drive what products are chosen to build a comprehensive infrastructure for real-time data integration. But the term real time takes on a somewhat different meaning when incorporating data acquisition technologies. Some technologies focus on the concept of “right-time” for business intelligence (BI). The term denotes the varying needs of the end users in accessing intelligence, and that means those needs change in different use cases.
But for operational data warehousing, the technology should not rely on a right-time paradigm. The technology should deliver true real-time capabilities and then let the business user choose the right time to access the data. Nevertheless, some businesses may find value in the ideology of right-time BI, which raises the question, When should an organization use real-time data integration?
In the real world, corporations use mixed IT architectures from multiple vendors (often a legacy of corporate history). When selecting a real-time data integration technology, look for one that can easily pull together information from a variety of database and application platforms. This is the biggest key to success.
The integration platform is the foundation for real-time data, and cross-product compatibility is one of its core tenants. But finding a platform that combines those elements and supports real-time processing without introducing difficulties is a challenge.
Oracle’s product for this platform is GoldenGate, which works with Oracle Database and rival products. There are also other real-time platforms out there, all of which should be examined under several scenarios where real-time data integration is under consideration:
High-availability. It should automatically maintain a live remote copy of your application’s interim data. This is so your business application can fail over to secondary storage during a disaster recovery scenario with minimal downtime.
Live migration. Upgrade, migration or maintenance of a production system typically involves downtime. A real-time data integration platform would ideally enable zero-downtime migration so the new system can be populated with old system data without downtime.
Integration of heterogeneous systems. Your applications rely on Oracle, Microsoft SQL Server, Sybase, DB2. A real-time data integration platform can make them all operate on the same, shared data with minimal integration effort.
Mergers, acquisitions and IT consolidation in a growing enterprise. Before your final, uniform architecture will be devised, a change data capture technique can quickly consolidate data from branches and departments. (By the way, we both know that there is no such thing as “final architecture.”)
Query offloading. An interesting side effect of sharing replicated data among multiple data marts is the improvement of OLTP performance and availability. Queries processed simultaneously by multiple servers should execute faster with your reports ready sooner.
Oracle customers have additional options that can strengthen the real-time data integration process. Products such as Oracle Active Data Guard are useful if the source and copies are based on identical Oracle versions and data models, while Oracle Real Application Clusters promises transparent application failover if the copies are in close proximity. Finally, with the Infrastructure as a Service model winning over the market for its price and elasticity, you can take advantage of a remote cloud to host a secondary copy of the business-critical data.
This was first published in August 2012