A sweet theta join
My table contains two columns: persnr and tmstmp. How can I calculate the difference between the last two timestamps per persnr, without putting a select(...) in the select clause?

    Requires Free Membership to View

    By submitting your registration information to SearchOracle.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchOracle.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

There may be other ways to do this, but here's a simple solution which involves a theta join. It's called a theta join for some obscure mathematical reason, but what it means is simply that it's not an equi-join. The join condition involves an inequality, in this case a "less than" comparison, rather than an equality.

select t1.persnr  
     , t1.tmstmp     as second_last_timestamp
     , max(t2.tmstmp)   as last_timestamp
  from yourtable t1
inner
  join yourtable t2
    on t1.persnr = t2.persnr
   and t1.tmstmp < t2.tmstmp     
group
    by t1.persnr
     , t1.tmstmp
having count(*) = 1    

First of all, this theta join is a self-join, which means that the table is joined to itself. More precisely, the first ON condition restricts the join so that only rows for the same persnr are joined together. The second ON condition restricts the possible combinations to only those where the first timestamp is less than the second.

If this second condition is not too clear, look at this example. Suppose we had values 5, 7, 9, and 37. If we were to "self-join" these values, we'd get:

 5   5
 5   7
 5   9
 5  37
 7   5
 7   7
 7   9
 7  37
 9   5
 9   7
 9   9
 9  37
37   5
37   7
37   9
37  37

Now, let's add the inequality condition, that the first value must be less than the second:

 5   7
 5   9
 5  37
 7   9
 7  37
 9  37

See how more than half of the combinations have been filtered out? And yet there are still more combinations that in an equi-join.

Now we add a final condition. Grouping each of these sets of numbers by the first number, we see in the HAVING clause that we want only those groups where there is only one row in the group, and there is only one such group:

 9  37

With the HAVING clause ensuring that there's only one t2 value for each t1 value, we can go ahead and use either MIN() or MAX() on the t2 value in the SELECT list (since they're equal). So the query nicely isolates the second last and last timestamp for each persnr.

All that's left to do is add to the SELECT list an expression to calculate the difference between max(t2.tmstmp), the last timestamp, and t1.tmstmp, the second last timestamp. This is not shown above because the date functions for timestamp differences can vary greatly from one database system to another.


This was first published in August 2004

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.