As with any IT project, the integration of real-time data into a data warehouse can benefit from best practices. Luckily, many of the recommendations that work for data warehousing are also applicable to real-time data integration and follow some common-sense rules.
Set up project organization early on. Project management is critical when it comes to data integration. Oracle Data Integrator (ODI) provides many tools that help organize the development work and project lifecycle—including security profiles, development projects, folders, markers, versioning, import and export and printing PDF documentation—all of which fit into project management. It is critical to review and use all those tools and features for managing the project. From the beginning of the project, define the guidelines and organization, the naming conventions and everything that will avert chaos and enforce best practices.
Make sure IT pros understand the platform topology and contexts. In the world of Oracle, the ODI topology and its associated contexts are two of the most powerful features for running design-time or run-time artifacts in various environments. ODI runs on top of a logical architecture (logical schemas, logical agents) that resolves in a given context to a physical architecture and ODI run-time agents. Contexts define the connections between the data and its underlying architecture such as databases and data servers, and they allow designers to switch the execution of the artifacts from one environment to another.
The key is to fully grasp the differences between logical and physical architectures and understand how the concept of context affects the design and transformation of information. Understanding that helps to avoid some common context mistakes, such as not correctly performing logical or physical architecture mappings for a given context. Such a mistake leads to executions that won’t work in a given context. Another mistake is forcing context in situations where it is not warranted. Unnecessary context wastes clock cycles and adds latency to the data integration process. Context should only be forced when there is a functional reason for it.
Incorporate context-independent design. Here ODI artifacts (procedure, variables, interfaces) do not have hard-coded elements. The biggest mistake is forcing the coding of expressions and code that incorporates qualified object names. If those names aren’t context-aware, then the schema names or variables will be incorrect when applied to an environment under a different context. Schema names are defined in the topology, and contexts get a different schema name depending on the execution context. When working across contexts, use a substitution coding method.
Use procedures only when needed. Procedures allow you to perform complex actions, such as SQL queries, and use source and target connections as well as data binding. Procedures let you move data, but they shouldn’t move and transform data. Interfaces should perform these operations—much to the chagrin of developers who are comfortable using SQL.
The problem with procedures is they contain manual code that needs to be maintained manually. What’s more, procedures do not maintain cross-references to other ODI artifacts such as data stores and models, and that makes maintaining them complicated, especially when compared with maintaining interfaces.
Enforce data quality. Project managers often don’t take into account the quality of their data. This is a huge mistake, since the data integration process may move and transform garbage data and propagate it throughout the applications. Luckily, ODI allows you to enforce data quality of source data using static checks. It also lets you use flow checks to determine the quality of the data before it is pushed into a target. Using both these checks, you can make sure that the quality of the data is improved or enforced when the data is moved and transformed. Make sure you always enforce data quality using both static and flow checks.
Address error processing. With ODI you can address error cases with packages, which allow you to sequence any number of steps. Every step in a package can fail for some reason; for example, the target or source database is down, or too many errors are detected in one interface. When building packages, consider the different types of errors that may crop up and cause a failure. With ODI, this is a simple case of defining the path to follow when you get an OK status, which indicates success, or a KO, or failure. Those path definitions will make packages bulletproof.
Choose the right knowledge module. The knowledge module (KM) choice is critical when designing interfaces. The KM choice affects what features are available and the performance of the interface. Beginners often make mistakes with KMs. Often new designers choose an advanced or complicated KM to get interfaces up and running quickly and overlook all the requirements needed for a KM. For example, if you choose technology-specific KMs using loaders, the interface won’t work if the loader configuration isn’t correct. The best bet is to start working with generic KMs (usually SQL). After designing your first interfaces, you can switch to specific KMs—but read their descriptions first—and leverage all their features.
KMs with extra features can hurt performance. Performing a simple insert is faster than performing an incremental update. If you are deleting the data in the target before integration, don’t use an incremental update. That’s over-engineering, and it will cause performance loss. Use the KM that fits your needs. Similarly, activating or deactivating some of the KM features may add extra steps that can degrade the performance. Default KM options are sufficient for running the KM. After running it with default options, review the options to determine if some of them can be changed to suit your needs. The KM and option descriptions are good documentation for understanding how to best use KMs. KMs offer a powerful framework that can be used at any point during an integration flow in ODI. Many KMs are available out of the box and support a large number of database features.
Best practices can save countless hours on a data integration project, and following them when working with ODI will breed success and help establish a working framework for data integration.
This was first published in August 2012