Here is the continuation of Rudy's answer. See part
A self-identifying key allows you to recognize which entity it represents. For example, Title would be a suitable primary key for a Book table because it pretty much identifies each book (unless your database might hold more than one book with the same title, in which case you'd need to include Author in a compound primary key). If you had a BookOrder table, the foreign key to Book would be Title (or Title plus Author), and you wouldn't need to join BookOrder to Book to figure out which book the order was for. Thus, a self-identifying key has the advantage of simplifying queries. A surrogate key has no meaning, and always requires joins.
But Title and Author are bulky, because realistically they would be VARCHAR(100) or something. This means all foreign keys would be bulky too. Most physical database designers frown on bulky foreign keys, especially if there are many related tables. Notice that there's nothing wrong with a bulky primary key if it has no related foreign keys!
Sometimes natural keys can repeat. It's quite common for two people to have the same name, so FirstName and LastName are usually inadequate as the primary key for an Employee table.
Sometimes natural keys can be null. A recently hired employee, prior to being issued an employee number, might still need to be entered into an employee table. It is common in this type of situation for a table to have a surrogate, generated key, so that the code number (employee number), which is not the key, can be null. So even where there's a column like employee number, which in this context would be natural (although it isn't particularly self-identifying), you'll often also have a surrogate key.
Finally, natural keys do change. When they do, the change must propagate to related tables if the natural keys were used as primary keys.
Given all of the above, are there any situations where natural keys are good primary keys? Sure, all the time. Consider a table of countries. My choice for primary key would be the two-character country code, because it's self-identifying, and because a surrogate key would be, at least to me, pretty useless. But rarely do you see an actual country table. More often, you just see the two-character country code used in other tables. It's really a foreign key, when you think about it. Even the most fanatical DBAs, who (quite rightly, in my mind) would insist on surrogate keys for every table, probably won't notice that natural country code foreign key slip past them.
For additional information about surrogate keys, see
For More Information
- Dozens more answers to tough SQL questions from Rudy Limeback.
- Have an SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
- Ask your technical SQL questions -- or help out your peers by answering them -- in our live discussion forums.
- Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.
This was first published in June 2002