As you will recall from my previous article, the usual example of a tree
structure in SQL books is called an adjacency list model and it looks like this:
CREATE TABLE Personnel
(emp CHAR(10) NOT NULL PRIMARY KEY,
boss CHAR(10) DEFAULT NULL REFERENCES Personnel(emp));
Personnel
emp boss
===================
'Albert' 'NULL'
'Bert' 'Albert'
'Chuck' 'Albert'
'Donna' 'Chuck'
'Eddie' 'Chuck'
'Fred' 'Chuck'
Another way of representing trees is to show them as nested sets. Since
SQL is a set oriented language, this is a better model than the usual
adjacency list approach you see in most text books. Let us define a simple
Personnel table like this, ignoring the left (lft) and right (rgt) columns
for now. This problem is always given with a column for the employee and
one for his boss in the textbooks. This table without the lft and rgt
columns is called the adjacency list model, after the graph theory
technique of the same name; the pairs of nodes are adjacent to each other.
CREATE TABLE Personnel
(emp CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt) );
Personnel
emp lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10
The organizational chart would look like this as a directed graph:
Albert (1,12)
/ \
/ \
Bert (2,3) Chuck (4,11)
/ | \
/ | \
/ | \
/ | \
Donna (5,6) Eddie 7,8) Fred (9,10)
To show a tree as nested sets, replace the nodes with ovals, then nest
subordinate ovals inside each other. The root will be the largest oval and
will contain every other node. The leaf nodes will be the innermost ovals
with nothing else inside them and the nesting will show the hierarchical
relationship. The rgt and lft columns (I cannot use the reserved words
LEFT and RIGHT in SQL) are what shows the nesting.
To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a node, he puts a number
in the cell on the side that he is visiting and increments his counter.
Each node will get two numbers, one of the right side and one for the
left. Computer Science majors will recognize this as a modified preorder
tree traversal algorithm.
The code for implementing this in T-SQL is a straight forward stack
implementation. First, let's load up some data into a tree table and then
create a stack table. I will explain how the stack works in a minute.
-- Tree holds the adjacency model
CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
boss CHAR(10));
-- insert the sample data for testing
INSERT INTO Tree VALUES ('Albert', NULL);
INSERT INTO Tree VALUES ('Bert', 'Albert');
INSERT INTO Tree VALUES ('Chuck', 'Albert');
INSERT INTO Tree VALUES ('Donna', 'Chuck');
INSERT INTO Tree VALUES ('Eddie', 'Chuck');
INSERT INTO Tree VALUES ('Fred', 'Chuck');
-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
emp CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);
Each row of the stack holds the nested set (lft, rgt) pair, the node value
(emp) and an integer that represents the current top of the stack as an
integer. When the stack_top is positive, something has been pushed onto
the stack. When the stack_top is negative, it has been popped off the
stack.
The algorithm is pretty straight forward, though there are some tricks
about representing a stack in T-SQL. Here is what we know:
-
We will do (2 * (SELECT COUNT(*) FROM Tree)) operation to build the
(lft, rgt) pairs for each node. We need a general counter for this.
-
When a node is pushed on the stack, we give it a lft number and
increment the counter.
-
When a node is popped from the stack, we give it a rgt number and
increment the counter.
-
We start at the root. Each nodes goes on and off the stack once and
only once.
-
We look at the top of the stack and push the youngest subordinate of
that node onto the stack
-
When the node on the top of stack is a leaf node or a node without
"un-popped" subordinates back in the tree, pop it off the stack.
Here is the code in T-SQL:
DROP TABLE Stack;
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
child VARCHAR(10) NOT NULL,
lft INTEGER NOT NULL,
rgt INTEGER);
-- you can create optional indexes on stack_top and child columns
BEGIN
DECLARE @lft_rgt INTEGER, @stack_pointer INTEGER, @max_lft_rgt INTEGER;
SET @max_lft_rgt = 2 * (SELECT COUNT(*) FROM Tree);
INSERT INTO Stack
SELECT 1, child, 1, @max_lft_rgt
FROM Tree
WHERE parent IS NULL;
SET @lft_rgt = 2;
SET @Stack_pointer = 1;
DELETE FROM Tree
WHERE parent IS NULL;
-- The Stack is now loaded and ready to use
WHILE (@lft_rgt < @max_lft_rgt)
BEGIN
IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.child = T1.parent
AND S1.stack_top = @stack_pointer)
BEGIN -- push when stack_top has subordinates and set lft value
INSERT INTO Stack
SELECT (@stack_pointer + 1), MIN(T1.child), @lft_rgt, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.child = T1.parent
AND S1.stack_top = @stack_pointer;
-- remove this row from Tree
DELETE FROM Tree
WHERE child = (SELECT child
FROM Stack
WHERE stack_top = @stack_pointer + 1);
SET @stack_pointer = @stack_pointer + 1;
END -- push
ELSE
BEGIN -- pop the Stack and set rgt value
UPDATE Stack
SET rgt = @lft_rgt,
stack_top = -stack_top
WHERE stack_top = @stack_pointer
SET @stack_pointer = @stack_pointer - 1;
END; -- pop
SET @lft_rgt = @lft_rgt + 1;
END; -- if
END; -- while
SELECT * FROM Stack ORDER BY lft;
Stack
stack_top emp lft rgt
-----------------------------
-1 Albert 1 12
-2 Bert 2 3
-2 Chuck 4 11
-3 Donna 5 6
-3 Eddie 7 8
-3 Fred 9 10
Note that the leftover stack_top numbers are the negatives of the depth of
their node in the original tree. Also, notice that the original tree is
being destroyed in this procedure; you might want to save and use a copy in
a temporary table instead.
About the Author
Joe Celko is author of SQL for Smarties: Advanced SQL Programming (Morgan-Kaufmann, 1999).