Comparing trees in the nested sets model

Some SQL code to compare nodes and structures of trees in the Nested Sets model for hierarchies.

Code updated on July 31, 2002.

Here is some SQL code to compare nodes and structures of trees in the Nested Sets model for hierarchies. Rudy Limeback replied to a question about comparing rows in different tables on 21 September 2001. The requester was asking it in regards to my nested sets model of trees, so I would like to reply.

There are really several kinds of equality comparisons when you are dealing with a hierarchy:

  1. Same nodes in both tables
  2. Same structure in both tables
  3. Same nodes and structure in both tables -- they are identical

Let me once more invoke my semi-quasi-famous organization chart in the nested sets model. If you do not understand the nested sets model, then read the earlier articles here and here:

 CREATE TABLE Personnel 
 (emp CHAR(10) NOT NULL PRIMARY KEY, 
 lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), 
 rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), 
 CONSTRAINT order_okay CHECK (lft < rgt) ); 

Insert sample data:

 Personnel 
 emp         lft  rgt 
 ====================== 
 'Albert'      1   12 
 'Bert'        2    3 
 'Chuck'       4   11 
 'Donna'       5    6 
 'Eddie'       7    8 
 'Fred'        9   10 

The organizational chart would look like this as a directed graph:

           Albert (1,12) 
           /        \
         /           \ 
   Bert (2,3)    Chuck (4,11) 
                  /    |    \ 
                /      |     \ 
              /        |      \ 
            /          |       \  
       Donna (5,6)  Eddie (7,8)  Fred (9,10) 

Case one: Same nodes

Let's create a second table with the same nodes, but with a different structure:

 CREATE TABLE Personnel_2 
 (emp CHAR(10) NOT NULL PRIMARY KEY, 
 lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), 
 rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), 
 CONSTRAINT order_okay CHECK (lft < rgt) ); 

Insert sample data:

 Personnel_2 
 emp         lft  rgt 
 ====================== 
 'Albert'      1   12 
 'Bert'        2    3 
 'Chuck'       4    5 
 'Donna'       6    7 
 'Eddie'       8    9 
 'Fred'       10   11 

                     Albert(1,12) 
                           | 
         ------------------------------------------       
         |       |         |          |           | 
   Bert(2,3) Chuck(4,5) Donna(6,7) Eddie(8,9) Fred(10,11) 

Rudy gave a few different ways of matching nodes between Personnel and Personnel_2. Here is one way that is a little different:

SELECT 'They have the same nodes'
   FROM Personnel_2 AS P0
        FULL OUTER JOIN
        Personnel_3 AS P1
        ON P0.emp = P1.emp
 HAVING (SELECT COUNT(*) FROM Personnel_2)
        = (SELECT COUNT(*) FROM Personnel_3)
    AND (SELECT COUNT(*) FROM Personnel_2)
        = COUNT(*)
    AND (SELECT COUNT(*) FROM Personnel_3)
        = COUNT(*);

In fairness, a FULL OUTER JOIN is hard to find in many SQL products, even today. The idea is that anyone without a match in the other table will get generated NULLs and become a row by himself for the count.

Case two: Same structure

I will not bother with an ASCII drawing, but let's present a table with sample data that has different people inside the same structure.

 Personnel_3 
 emp         lft  rgt 
 ====================== 
 'Amber'      1   12 
 'Bobby'      2    3 
 'Charles'    4   11 
 'Donald'     5    6 
 'Edward'     7    8 
 'Frank'      9   10 

If these things are really separate trees, then life is easy. You do the same join as before but with the (lft,rgt) pairs:

 SELECT DISTINCT 'They have the same structure' 
   FROM Personnel AS P0 
  WHERE COUNT(P0.emp) 
        = (SELECT COUNT(*) 
             FROM Personnel AS P0 
                  FULL OUTER JOIN 
                  Personnel_3 AS P1 
                  ON P0.lft = P1.lft 
                     AND P0.rgt = P1.rgt); 

Case three: Same nodes and same structure

This is not a big jump to simply extend the predicate:

 SELECT DISTINCT 'They are identical' 
   FROM Personnel AS P0 
  WHERE COUNT(P0.emp) 
        = (SELECT COUNT(*) 
            FROM Personnel AS P0 
                 FULL OUTER JOIN 
                 Personnel_2 AS P1 
                 ON P0.lft = P1.lft 
                    AND P0.rgt = P1.rgt 
                    AND P0.emp = P1.emp); 

But more often than not, you will be comparing subtrees within the same tree. This is best handled by putting the two sub-trees into a canonical form. First you need the root node (i.e the emp in this sample data) and then you can re-number the (lft, rgt) pairs with a derived table of this form:

 (SELECT emp, 
         lft - (SELECT MIN(lft FROM Personnel), 
         rgt - (SELECT MIN(lft FROM Personnel) 
    FROM Personnel AS P0 
   WHERE P0.lft 
         BETWEEN (SELECT lft 
                    FROM Personnel 
                   WHERE emp = :emp_root_subtree_2) 
             AND (SELECT rgt 
                    FROM Personnel 
                   WHERE emp = :emp_root_subtree_2)) 
  AS CP0 (emp, lft, rgt) 

The derived table with have its (lft,rgt) pairs start at zero instead of one. That is actually an advantage when you insert a canonical tree structure into an existing table, but that is another article.

If you would like to see this all in one monster query:

 SELECT CASE COUNT(P.emp) 
        WHEN (SELECT COUNT(*) 
                FROM Personnel AS P0 
                     FULL OUTER JOIN 
                     Personnel_3 AS P1 
                     ON P0.lft = P1.lft 
                        AND P0.rgt = P1.rgt 
                        AND P0.emp = P1.emp) 
        THEN 'The trees are identical' 
        WHEN (SELECT COUNT(*) 
                FROM Personnel AS P0 
                     FULL OUTER JOIN 
                     Personnel_2 AS P1 
                     ON P0.emp = P1.emp) 
        THEN 'The trees have the same nodes' 
        WHEN (SELECT COUNT(*) 
                FROM Personnel AS P0 
                     FULL OUTER JOIN 
                     Personnel_2 AS P1 
                     ON P0.lft = P1.lft 
                        AND P0.rgt = P1.rgt) 
        THEN 'The trees have the same structure' 
        ELSE 'The trees have different node sets and structure' 
   FROM Personnel AS P; 

This depends on the CASE expression operating from left to right and returning the first TRUE clause it finds. Since there will a lot of indexes on the tree tables, joins and aggregate functions should be relatively fast to perform. For an exercise, try writing the same structure queries with the adjacency list model.

About the Author

Joe Celko is author of SQL for Smarties: Advanced SQL Programming (Morgan-Kaufmann, 1999).

For More Information


This was first published in March 2002
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close