Q

What does this theta self-join mean?

Could you please explain to me the DB2 query given below?? It is a self join, but the part which I cannot understand

is how it manages to generate sequential order numbers with the help of self join and where clause.

SELECT count(*) + 1000,
 A.SLS_TM_NBR,
 A.FFO_TYP_CD,
 A.FFO_NBR
FROM
 SP1SLADB.VWK_ORD_GEN4 A,
 SP1SLADB.VWK_ORD_GEN4 B
WHERE
 ((A.SLS_TM_NBR > B.SLS_TM_NBR) OR
 (A.SLS_TM_NBR = B.SLS_TM_NBR AND
?? A.FFO_TYP_CD > B.FFO_TYP_CD) OR
 (A.SLS_TM_NBR = B.SLS_TM_NBR AND
?? A.FFO_TYP_CD = B.FFO_TYP_CD AND
?? A.FFO_NBR >= B.FFO_NBR))
GROUP BY
 A.SLS_TM_NBR,
 A.FFO_TYP_CD,
 A.FFO_NBR

Don't let the complex join condition throw you. This self-join simply counts rows which are "less than" the row in question. This type of join is not an equi-join, where rows are matched that have equal values in some key. Rather, each row is joined to all the rows that satisfy the inequality. This is called a theta join even though few people use this terminology. (I just call it a "less than" join.) With each row joined to all its lesser rows, the GROUP BY generates a COUNT(*) which simply indicates how many lesser rows each row has.

One example where this type of join is useful is in determining a ranking. Imagine a table like this --

theID  theValue
  a       6
  b       8
  c      12
  d       7
  e      11
  f      15
  g      13
  h      21
  i       9

Running a simple query with an ORDER BY will get you the records in the right sequence, but using a theta self-join will generate rankings. Remember that the ranking is actually just the number of lesser rows --

select A.theValue
     , count(*) as theRank
  from theTable as A
     , theTable as B
where A.theValue >= B.theValue
group by
       A.theValue

theValue theRank 6 1 7 2 8 3 9 4 11 5 12 6 13 7 15 8 21 9

Notice that the results are ranked from smallest to largest. This would be the case when ranking golf scores or demerit points, say. For other queries, where the top rank is the largest, the theta condition would be a "greater than" comparison instead of "less than." Also, notice I had to use "less than or equal to" to generate a result row for every row in the A table. Otherwise, the lowest row is left out, as there are no rows lower than the lowest.

Warning: the above query will work correctly only if there are no "ties." If you have ties in your data, the ranking query is substantially more complex.

So, given that the theta join conveniently produces a number, which may or may not be a ranking, but which in your example was further massaged by adding three zeroes to it, is this a good way to generate order numbers? The answer is maybe. It is not a good method if there are ties, because then there will be more than one row with the same number of lesser rows, and hence the same order number!! For your own comfort, double-check to make sure there are UNIQUE constraints on the table columns that are used in the WHERE clause.


This was first published in July 2001

Dig deeper on Oracle and SQL

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close