We have all been taught to normalize our relational databases to at least the third normal form. This eliminates data redundancy and decreases the size of the database. This has a lot of benefits, too many for me to list. However, there is a problem that occurs when a designer applies normalization principles on a work tracking system. Even the most experienced database designers overlook this problem, and it will cause problems in the work/performance reports.
To illustrate, assume that we have a work tracking system with the following attributes:
Work Order Number
Work Order Requestor Employee Number
Department Number of the Employee
After normalizing the attributes we end up with two entities (or tables). Work Order Requestor Employee Number is a foreign key to Employee Number.
(A diagram of the schema will appear here shortly.)
This is a typical schema for a tracking system. Using this schema, a system can record the employee number of the employee that has performed something such as requested a work order. This schema also has an Employee table that contains attribute information. Most databases also have an Employee entity. Since there is an Employee table there is no need to document Employee attributes in the Work Tracking table. One of these employee attributes is usually the employee's department.
Relational database technology
SELECT department_number, work_order_number, requestor_number FROM Employee, work_orders WHERE employee_number = requestor_number AND department_number = "A";
This SELECT statement will produce a list of work orders issued by employees currently in Department "A". This might be the result we want or it might not. The word "currently" in the previous sentence might cause a problem.
The purpose of the query was to supply a list of work orders issued by Department "A". If an employee transferred from the department, the employee is no longer a "current" employee of the old department. The employee now has a different department number and will not appear on the Department "A" list. Thus, using the above query will result in the employee's new department getting the credit for the work order requests rather than the old department. This will cause your report writers to pull their hair out, and it is very difficult to fix after a system is implemented.
This is a typical situation that I see in many work-tracking systems, even in the major ERP systems marketed today. The problem occurs when the database designer tries to place department into the proper entity. They believe the Employee's Department Number is dependant upon the Employee, which is true. However, in work tracking systems we are looking for historical information. We need to see the department of the employee at the time the work order was documented. This may be different value than the "current" Department number of the employee. This means the employee's department is dependent upon the work order and should exist within that entity.
A better schema is to place some redundancy into the database. As illustrated in the diagram below (to appear here shortly), the Requestor's Department is placed in the Work_orders table. The attribute is populated from the Department Number value in the Employee table. Now the database has a Department value that will not be affected by transferred employees. Your report writers will love.
In summary, when you are developing any kind of tracking system and you have attributes supplied from a related table that may change over time, be sure to place these historic values in the table of the tracked entity, even if they are appear to be redundant. They won't be. You won't believe the number of personnel changes that can occur.
Feedback about this tip received from member Joe Celko
RE: John Palinski's article on the tracking system. I think a better approach would be to use temporal information in the tables:
CREATE TABLE WorkOrders (work_order_nbr INTEGER NOT NULL PRIMARY KEY, work_order_date DATE NOT NULL, requestor INTEGER NOT NULL REFEREENCES Personnel (emp_nbr) ...)
CREATE TABLE PersonnelHistory (emp_nbr INTEGER NOT NULL, emp_name CHAR(20) NOT NULL, dept_nbr INTEGER NOT NULL DEFAULT 'unassigned' REFERENCES Department(dept_nbr), start_date DATE NOT NULL, final_date DATE, -- null means current dept CHECK(start_date <= final_date), ..., PRIMARY KEY (emp_nbr, start_date))
Now the query is:
SELECT P.dept_nbr, W.work_order_nbr, W.requestor FROM Personnel AS P, Work_Orders AS W WHERE P.emp_nbr = W.requestor AND W.work_order_date BETWEEN P.start_date AND COALESCE(P.final_date, W.work_order_date)
You might also want to have this handy in the schema:
CREATE VIEW CREATE TABLE Personnel (emp_nbr, emp_name, dept_nbr) AS SELECT emp_nbr, emp_name, dept_nbr FROM PersonnelHistory WHERE final_date IS NULL
About the Author
John Palinski has been an Oracle developer and project manager for the last fourteen years. He developed one of one of the first relational database work management systems in the country. Mr. Palinski also teaches his own Oracle courses at Iowa Western Community College and the University of Nebraska at Omaha. Mr. Palinski is the author of the "Oracle Database Construction Kit" published by QUE and provides custom Oracle training worldwide through his consulting business, Realistic Software Training.
For More Information
- What do you think about this tip? E-mail us at editor@searchDatabase.com with your feedback.
- The Best Database Design Web Links: tips, tutorials, scripts, and more.
- The Best Data Modeling Web Links.
- Have a database design to offer your fellow DBA's? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical database design questions--or help out your peers by answering them--in our live discussion forums.
- Check out our Ask the Experts feature: Our database design gurus are waiting to answer your toughest questions.
This was first published in February 2001