I want to display all the employees who report to a person. My data is like a tree structure.
eid ename mgrid 1 Emp1 0 2 Emp2 1 3 Emp3 1 4 emp4 2 5 emp5 3 6 emp6 4
For example, if I give eid as 2, I need report as:
eid ename mgrid 4 emp4 2 6 emp6 4
I need this data in the report because
- eid 4's mgrid is 2
- eid 6's mgrid is 4.
How to do that using SQL?
This is a great example of a requirement to return a subtree. (Please refer to this Wikipedia page if you do not recognize terms like subtree, node, or level.)
Suppose the given eid is 2. So we are looking for all nodes that are in the subtree of 2. Here is one solution which uses nested subqueries:
select eid , ename , mgrid from employees as t1 where mgrid = 2 or exists ( select * from employees as t2 where eid = t1.mgrid and ( mgrid = 2 or exists ( select * from employees as t3 where eid = t2.mgrid and mgrid = 2 ) ) )
In effect, the query returns all employees:
- whose manager is 2, or
- whose manager's manager is 2, or
- whose manager's manager's manager is 2
There are many approaches to subtree problems, and this one certainly appears straightforward enough. However, our choice of solution should always be governed by practical considerations.
Is this a practical solution? Absolutely, yes. We have only four levels of data. Therefore, the query needs to "walk down" at most three levels from any given node. Try the query with the given node 1 to confirm.
But what if there are actually five levels of data and not four? Will the query still work?
Of course the query will still work! <grin> It will correctly return complete subtrees when there are one, two, or three levels below the given node. However, if a given node has more than three levels below it, only the first three levels down from that node will be returned.
What to do, what to do? Obviously, one answer is simply to extend the nested subquery pattern in the query to an additional level. Practical? You betcha!
But what if there actually is an unknown number of levels of data and not five? What then?
For the employee hierarchy, in the real world there is no such thing as an unknown number of levels. Ask the HR department. Then extend the query as many levels as necessary, plus one or two just to be safe. The number of levels is not really the issue here. Instead, it is the more practical consideration of whether umpteen levels of data producing an exponential bazillion rows of output is really of any use to anyone. Perhaps limiting the results to four levels down isn't such a bad idea. See also Categories and Subcategories for additional thoughts and query examples.
The employee hierarchy is an example of the adjacency list model, recognized by its use of a "parent id" (in this case mgrid). For other data structures, where there really is an unknown number of levels, consider using a different data model. See Trees in SQL by Joe Celko, which describes the nested set model, an alternative to the adjacency list model.
Dig Deeper on Oracle and SQL
Related Q&A from Rudy Limeback
Read SQL expert Rudy Limeback's advice for counting combinations in a table with SQL's GROUP BY clause Continue Reading
Read about the Mimer Validator, a tool used to verify your SQL code, in this tip from SQL expert Rudy Limeback. Continue Reading
Read an example of an SQL case expression from our SQL expert Rudy Limeback. Continue Reading