In the world of my database application, a person has an ethnicity. The input for ethnicity comes from a fixed pick-list, with values like:
- Black/African American
- Native American
- Southeast Asian
- other Asian/South Pacific Islands
I can see a few ways of handling this:
An "ethnicity" column of type TEXT in the People table, storing the exact values above. What worries me about this is the possibility of the UI developers making a seemingly innocent change to the values sent from the form, e.g. changing 'other' to 'Other' or adding a hyphen between Southeast and Asian, and thereby muddying the data.
Similar to 1, but passing every value through a standardizing text formatter first, that might change all characters to lower case, and replace all consecutive non-letter characters with a single underscore. So the last example would become: 'other_asian_south_pacific_islands'. Problems: it's still not immune to spelling changes (e.g. 'Islands' to 'Islander'), and it's difficult to format the values back to their original format when queried for reports.
Have the form send back numeric codes rather than text, and have the back-end software translate that to text before submitting it to the database. Possible disadvantage: the textual representation is now coded both into the UI's display and in the software.
Create a separate Ethnicity table, with a unique numeric identifier column, and a TEXT column for the descriptive text. The form would send identifiers, which the back-end would store directly to the People table's ethnicity column (a foreign key to the new table's identifier column). Possible disadvantage: the textual representation is now coded both into the UI's display and the database. Queries also become somewhat more complicated.
Which would you favor?
Presumably there's a legitimate purpose for collecting and storing this information, most likely for some HR policy. Without controls of some kind on the valid options, the "garbage in, garbage out" rule will prevail. HR will therefore probably oppose any scenario where you cannot demonstrate that there is adequate control. On that basis, options 1 and 2 should be discarded.
Option 3 is on the right track, but success depends on the whims of both the UI developers and backend developers, and how well they communicate and coordinate their efforts when changes are required, not to mention the time and effort involved in making those changes. HR's not going to like this option either.
That leaves option 4. In this scenario, the Ethnicity table is an example of a lookup table. Not only does it make the People table smaller, but it affords total control, through relational integrity, over the valid options that can be selected. If a spelling change is required, or a new ethnicity option needs to be added, you make one change, in the lookup table, and you're done. Furthermore, you would give this administrative function to HR. The UI and backend developers are no longer involved, because the UI's display would be populated directly from the lookup table.
Are the queries somewhat more complicated? Well, yes, but only somewhat. A simple join. Hardly complicated at all.
For More Information
- Dozens more answers to tough SQL questions from Rudy Limeback.
- The Best SQL Web Links: tips, tutorials, scripts, and more.
- Have an SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
- Ask your technical SQL questions -- or help out your peers by answering them -- in our live discussion forums.
- Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.
Dig Deeper on Oracle and SQL
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.