Problem solve Get help with specific problems with your technologies, process and projects.

Matching data coded in databases spread across the country

I have a critical and complicated situation because I have distributed an application around the country, and now the central level wants to consolidate and have the posibility to make decisions about the data store in all the databases. We don't know how to match all the data coded in all those databases. For instance: In one database you can have the speciality Informatico that is coded by the application and in that case it can take the code value "8654". And in the other database you can have the speciality Informatico that is coded by the application, and in that case it can take the code value "8864" etc. If I need to get all the people in all the country that have the speciality Informatico, I don't know how to get it with that structure. Do you have any idea?

You have experienced the Data Stewardship "blackhole". I assume the application is essentially the same across the country. Same software, same edits and business rules, same data structures. What's not the same is the coding standards from location to location. No central control of corporate level subject areas like product (my assumption of "informatico"). Yet, now the executive management needs to see roll ups across the country for corporate-level and commonly know concepts, such as sales by product using the corporate-level product-catalog nomenclature. What you are being asked to do is best provided via the design and implementation of an architected data warehousing environment.

Here's some cliff notes/suggestions to get you started. We need to get into much more detail to ensure ongoing success.

1st: quickly model the corporate level set of subject areas and key entities within those subject areas

2nd: analyze and associate the application's database tables to those corporate-level entities.

3rd: run some "content-analysis" tools against the distributed applications' tables so you can product some statistics of common, duplicate, and variances in codes, such as "Informatico".

4th: sit with the business community to get a corporate level definition of what is good data. For example: what is the official corporate-level list of Informatico codes and what do they means?

5th: build a cross-reference map between the distributed application table actual values to the above defined corporate-level values.

6th: run the data through these cross-references. Make sure you capture and save all of the distributed application table rows that fail to properly map per the cross-reference definitions. Because you'll want to get with the business to resolve these unexpected data values.

7th: deposit the cross referenced data IN THE FORM OF THE CORPORATE-LEVEL DATA STRUCTURES into the in-take layer of your data warehouse environment. Again, make sure you load the errors too. This breeds accountability. 8th: get a feel for how the management wants to access the data and build the appropriate set of data marts to support those requirements.

Let me know how it goes!

For More Information

Dig Deeper on Oracle database design and architecture

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.