As you will recall from my previous article, the usual example of a tree structure in SQL books is called an adjacency list model and it looks like this:
CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, boss CHAR(10) DEFAULT NULL REFERENCES Personnel(emp)); Personnel emp boss =================== 'Albert' 'NULL' 'Bert' 'Albert' 'Chuck' 'Albert' 'Donna' 'Chuck' 'Eddie' 'Chuck' 'Fred' 'Chuck'
Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple Personnel table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of nodes are adjacent to each other.
CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), CONSTRAINT order_okay CHECK (lft < rgt) ); Personnel emp lft rgt ====================== 'Albert' 1 12 'Bert' 2 3 'Chuck' 4 11 'Donna' 5 6 'Eddie' 7 8 'Fred' 9 10
The organizational chart would look like this as a directed graph:
Albert (1,12) / \ / \ Bert (2,3) Chuck (4,11) / | \ / | \ / | \ / | \ Donna (5,6) Eddie 7,8) Fred (9,10)
To show a tree as nested sets, replace the nodes with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other node. The leaf nodes will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a node, he puts a number in the cell on the side that he is visiting and increments his counter. Each node will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm.
The code for implementing this in T-SQL is a straight forward stack implementation. First, let's load up some data into a tree table and then create a stack table. I will explain how the stack works in a minute.
-- Tree holds the adjacency model CREATE TABLE Tree (emp CHAR(10) NOT NULL, boss CHAR(10)); -- insert the sample data for testing INSERT INTO Tree VALUES ('Albert', NULL); INSERT INTO Tree VALUES ('Bert', 'Albert'); INSERT INTO Tree VALUES ('Chuck', 'Albert'); INSERT INTO Tree VALUES ('Donna', 'Chuck'); INSERT INTO Tree VALUES ('Eddie', 'Chuck'); INSERT INTO Tree VALUES ('Fred', 'Chuck'); -- Stack starts empty, will holds the nested set model CREATE TABLE Stack (stack_top INTEGER NOT NULL, emp CHAR(10) NOT NULL, lft INTEGER, rgt INTEGER);
Each row of the stack holds the nested set (lft, rgt) pair, the node value (emp) and an integer that represents the current top of the stack as an integer. When the stack_top is positive, something has been pushed onto the stack. When the stack_top is negative, it has been popped off the stack.
The algorithm is pretty straight forward, though there are some tricks about representing a stack in T-SQL. Here is what we know:
- We will do (2 * (SELECT COUNT(*) FROM Tree)) operation to build the (lft, rgt) pairs for each node. We need a general counter for this.
- When a node is pushed on the stack, we give it a lft number and increment the counter.
- When a node is popped from the stack, we give it a rgt number and increment the counter.
- We start at the root. Each nodes goes on and off the stack once and only once.
- We look at the top of the stack and push the youngest subordinate of that node onto the stack
- When the node on the top of stack is a leaf node or a node without "un-popped" subordinates back in the tree, pop it off the stack.
Here is the code in T-SQL:
DROP TABLE Stack; CREATE TABLE Stack (stack_top INTEGER NOT NULL, child VARCHAR(10) NOT NULL, lft INTEGER NOT NULL, rgt INTEGER); -- you can create optional indexes on stack_top and child columns BEGIN DECLARE @lft_rgt INTEGER, @stack_pointer INTEGER, @max_lft_rgt INTEGER; SET @max_lft_rgt = 2 * (SELECT COUNT(*) FROM Tree); INSERT INTO Stack SELECT 1, child, 1, @max_lft_rgt FROM Tree WHERE parent IS NULL; SET @lft_rgt = 2; SET @Stack_pointer = 1; DELETE FROM Tree WHERE parent IS NULL; -- The Stack is now loaded and ready to use WHILE (@lft_rgt < @max_lft_rgt) BEGIN IF EXISTS (SELECT * FROM Stack AS S1, Tree AS T1 WHERE S1.child = T1.parent AND S1.stack_top = @stack_pointer) BEGIN -- push when stack_top has subordinates and set lft value INSERT INTO Stack SELECT (@stack_pointer + 1), MIN(T1.child), @lft_rgt, NULL FROM Stack AS S1, Tree AS T1 WHERE S1.child = T1.parent AND S1.stack_top = @stack_pointer; -- remove this row from Tree DELETE FROM Tree WHERE child = (SELECT child FROM Stack WHERE stack_top = @stack_pointer + 1); SET @stack_pointer = @stack_pointer + 1; END -- push ELSE BEGIN -- pop the Stack and set rgt value UPDATE Stack SET rgt = @lft_rgt, stack_top = -stack_top WHERE stack_top = @stack_pointer SET @stack_pointer = @stack_pointer - 1; END; -- pop SET @lft_rgt = @lft_rgt + 1; END; -- if END; -- while SELECT * FROM Stack ORDER BY lft; Stack stack_top emp lft rgt ----------------------------- -1 Albert 1 12 -2 Bert 2 3 -2 Chuck 4 11 -3 Donna 5 6 -3 Eddie 7 8 -3 Fred 9 10
Note that the leftover stack_top numbers are the negatives of the depth of their node in the original tree. Also, notice that the original tree is being destroyed in this procedure; you might want to save and use a copy in a temporary table instead.
About the Author
Joe Celko is author of SQL for Smarties: Advanced SQL Programming (Morgan-Kaufmann, 1999).