I am in the process of designing a data model which needs to be extremely flexible to allow new entities/attributes to be added in the future. I have been considering using a very generic model to prevent the need for additional tables/columns to be added (requiring the data model to be continually changed) each time a new entity/attribute is required. It's generic nature would be achieved by saving data on a row basis instead of on a column basis, where every row is tagged with its corresponding definition, contained in a dictionary table. For the application I am working on, this is possible as each new entity is usually a slight variation of existing ones so they can be grouped together. My question is what is the drawback in using this type of method in terms of performance degradation and database size as compared to hardcoding table and column names, requiring changes to be made each time a new entity or attribute is needed?
It sounds like you are describing a variation of an associative database.
Current implementations of associative databases run roughly 80 times slower than static relational databases that do the same thing. This isn't as big a drawback as it might seem if there is not much data that needs manipulation in your database. Something that takes 120 milliseconds for a relational database should take around ten seconds for an associative database running on comparable hardware. Unfortunately, this means that something that takes ten seconds for a relational database would take nearly fifteen minutes!
The database size grows significantly in an associative database too, but it isn't as noticeable as the change in processing time. You can assume somewhere between a three- and a ten-to-one increase in storage.
One approach that might work well in this case is to create a base generic table that has all of the commonly used columns. Create a dictionary table that would define additional attributes that would be used only by some rows. Insert the base information into a row into the main table for each new row, then add additional rows into a "child" table to "flesh out" the missing data.
One potentially serious problem with this approach that is inherent in the associative model is that there isn't a good way to enforce constraints on the associative data. This can force your application to grow in order to cope with missing or invalid data, and can become a flaming pain in the posterior if there is ever a need to create a data warehouse for this data.
For More Information
- What do you think about this answer? E-mail us at editor@searchDatabase.com with your feedback.
- The Best Database Design Web Links: tips, tutorials, scripts, and more.
- Have a Database Design tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical Database Design questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts yourself: Our Database Design guru is waiting to answer your toughest questions.
This was first published in April 2001