Ask the Expert

What does this theta self-join mean?

Could you please explain to me the DB2 query given below?? It is a self join, but the part which I cannot understand is how it manages to generate sequential order numbers with the help of self join and where clause.

SELECT count(*) + 1000,

Don't let the complex join condition throw you. This self-join simply counts rows which are "less than" the row in question. This type of join is not an equi-join, where rows are matched that have equal values in some key. Rather, each row is joined to all the rows that satisfy the inequality. This is called a theta join even though few people use this terminology. (I just call it a "less than" join.) With each row joined to all its lesser rows, the GROUP BY generates a COUNT(*) which simply indicates how many lesser rows each row has.

One example where this type of join is useful is in determining a ranking. Imagine a table like this --

theID  theValue
  a       6
  b       8
  c      12
  d       7
  e      11
  f      15
  g      13
  h      21
  i       9

Running a simple query with an ORDER BY will get you the records in the right sequence, but using a theta self-join will generate rankings. Remember that the ranking is actually just the number of lesser rows --

select A.theValue
     , count(*) as theRank
  from theTable as A
     , theTable as B
where A.theValue >= B.theValue
group by

theValue theRank 6 1 7 2 8 3 9 4 11 5 12 6 13 7 15 8 21 9

Notice that the results are ranked from smallest to largest. This would be the case when ranking golf scores or demerit points, say. For other queries, where the top rank is the largest, the theta condition would be a "greater than" comparison instead of "less than." Also, notice I had to use "less than or equal to" to generate a result row for every row in the A table. Otherwise, the lowest row is left out, as there are no rows lower than the lowest.

Warning: the above query will work correctly only if there are no "ties." If you have ties in your data, the ranking query is substantially more complex.

So, given that the theta join conveniently produces a number, which may or may not be a ranking, but which in your example was further massaged by adding three zeroes to it, is this a good way to generate order numbers? The answer is maybe. It is not

    Requires Free Membership to View

a good method if there are ties, because then there will be more than one row with the same number of lesser rows, and hence the same order number!! For your own comfort, double-check to make sure there are UNIQUE constraints on the table columns that are used in the WHERE clause.

This was first published in July 2001

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: