You are reading this article because you are interested in integrating data from other databases such as SQL Server
into Oracle, and you wish to make an informed decision on the feasibility of this undertaking. In order to make this informed decision, you might also search Google to find a few relevant blogs, read other articles discussing the same concept, and view a multimedia demo from Oracle's Web site showing the technologies in action.
This is how a human being operates: we assimilate data from many different sources, process it and integrate it into our central storage known as the brain. Using this data, we are able to make informed decisions and progress in our day-to-day activities. In database terms, we extract data from hundreds of sources, transform this data into one format that our brain can process, and load the data into various pools in our brain, thus making the data useful for our daily analysis.
Now consider your server room, and the many sources of data that may be contained there. If all this data could be integrated into one central repository, a whole new range of possibilities would open. In this article, I'll provide an overview of how to do this using Oracle.
A real-life example
Consider the following database environment:
- An Oracle OLTP environment that processes sales.
- A SalesLogix database in SQL Server that stores call information.
- A legacy AS/400 repository of HR information.
- An Oracle data warehouse that aids in sales analysis, reporting, and BI.
Unfortunately, the only system that is native to the data warehouse is the OLTP environment, as they are both running Oracle. Sales are analyzed, but a greater degree of information could be accessible if only Oracle could integrate external data easily.
The jargon of integration
Before investigating the tools Oracle provides to bring your data together into an integrated database, it is very useful to understand a few key acronyms that are heavily used in the data integration world.
ETL: Extract, transform and load
The ETL concept has been around for quite a long time, and has been a popular method of pulling data from multiple sources and integrating this data into one central warehouse. The concept of ETL is based on three defined processes:
- Extract—This process involves the extraction of data from any source. In the example above, the data sources would be a SQL Server database, an AS/400 repository and an Oracle OLTP system. Data would be extracted from each of these sources by any means possible, whether through a direct database connection, a sophisticated data standard such as XML, or the always handy comma-separated values (CSV) file.
- Transform—During the transformation process, all data extracted in the previous phase would be cleaned and standardized into one set format. For instance, phone numbers and state names may require standardization between several original formats. This process would usually take place on a staging server that is made to perform the complex transformation procedures.
- Load—The transformed data is applied to the data warehouse by means of a data loading mechanism.
ELT: Extract, load and transform
This methodology is just like ETL, except that you decide where the transformation phase takes place. ELT involves performing the transformation either on the source or target database and eliminating the staging server. It really all depends on the needs of the source and target systems, and could just as easily be reconfigured as TETL, TELT, ETLT or TETLT. ELT tools use code and various methods of data movement in order to eliminate the need for a staging database system.
EII: Enterprise information integration
EII is a relatively new concept that has become popular in the world of service-oriented architecture (SOA) and business intelligence (BI). Think of EII as "information on demand." The concept is that there are many sources of data within a single corporate network, and nearly limitless sources of data all over the globe. By bringing this data together only when necessary, fewer resources can be spent pulling and loading data, and more resources can be spent on bringing data from multiple sources. EII-enabled technologies use several mechanisms for providing this on-tap form of data, including Web services, enterprise service bus (ESB) and B2B servers, all with the final goal of creating an abstraction layer consisting of virtual tables that can be queried as necessary.
Oracle integration tools
Oracle has several tools that can be used to assist with data integration. These tools allow heterogeneous access from a wide range of data sources such as SQL Server, MySQL, DB2, XML, CSV, WSDL and many other databases and formats.
ODI: Oracle Data Integrator
The ODI allows data to be pulled from a multitude of sources, cleaned and then loaded to a multitude of targets. This is accomplished through a code base with hot-pluggable components called knowledge modules that allow heterogeneous access to or from nearly any remote source.
Using process flows and data mapping, ODI is able to perform data integration in a method akin to ELT processing. Data is extracted from multiple sources, sent through several transformation processes and loaded into a final destination. Transformations can occur on the source systems or on the target environment through platform-specific code defined by the included knowledge modules.
In addition, ODI is able to provide data services to your SOA suite using native hooks built into the ODI software.
ESB: Enterprise service bus
Oracle's ESB is part of the SOA suite. The ESB is a component designed to extract data from Web services (regardless of the standard used), then transform the data in real time into immediately usable information by way of transformation mappings. Using this methodology, Oracle is able to achieve EII-like capabilities. Though EII is not purely based on Web services, Oracle's SOA suite provides integrators with nearly all the tools they need to leverage on-demand data via the EII architecture.
Oracle BI Publisher
Formerly known as XML Publisher, this tool has a pseudo-EII feel. Multiple data sources can be easily defined, such as XML flat files, Web services or direct database connections. Using predefined display templates, BI Publisher is able to produce complex reports with this on-demand data. The reason this is not true EII is that the multiple data sources do not come together to form an abstraction layer of virtual tables or views. Instead, BI Publisher can almost be thought of as a single-purpose real-time EII solution.
Transparent Gateways, SQL Loader, PL/SQL
It is still completely possible to write your own integration architecture using command-line scripts. While many shops are fully embracing the SOA and BI suites, some developers and administrators remain adamant that command line is best.
Most of the infrastructure behind tools such as ODI, ESB and BI Publisher is available right from the command line. Built-in packages facilitate the parsing, searching and creation of XML. SQL Loader allows both simple and complex flat files to be loaded into staging tables for processing. PL/SQL code can be written to clean, manage and integrate your data. And most importantly, Oracle Transparent Gateways will allow you to access data from a multitude of sources, such as SQL Server, heterogeneously.
Using Transparent Gateways, it is possible to create database links pointing to external database resources such as a SQL Server database. Oracle scripts can then extract the data as if it were a standard Oracle table for processing. If an EII interface is the preferred data access methodology, queries can also be written to join SQL Server (or other database) tables to Oracle tables in an integrated fashion for real-time queries without physical consolidation.
Whether you are using command-line tools for a scripted approach to data integration or a robust tool such as ODI, SOA Suite or BI Suite, Oracle makes data in a multitude of databases highly accessible.
Our "real-life example" above would pose little problem for the developer armed with these tools. The OLTP database, the SQL Server database and the AS/400 system would all be accessed through a single interface from which a final integrated database could be created.
About the author
Steve Karam is one of fewer than 20 DBAs worldwide to achieve the coveted Oracle 10g Certified Master status, and he has also been honored with the prestigious Oracle ACE designation. A former senior instructor for Oracle University, Steve has a proven track record in performance and troubleshooting on dozens of high-profile Oracle systems and complex Oracle 10g RAC environments. Combining his organizational and technical skills, Steve serves as a Senior Oracle Consultant for Burleson Consulting. Steve investigates an unprecedented number of Oracle databases and he excels at Oracle database assessment, Oracle tuning and system optimization. Steve is also an accomplished Oracle instructor and teaches dozens of Oracle classes for Burleson Consulting.