Predicting query performance on larger data sets

Predicting query performance on larger data sets

Regarding the Extracting the highest record from a group tip March 13 2002, I approached the problem a little differently, but got the same results.

After looking at the estimated execution plan for both queries, the work distribution was exactly the same (77% scan, 23% sort), which I assume is because the size of the recordset is so small. My question is: do you think there would be a significant performance difference between the approaches when using a much larger data set? The SQL I used was:

select a.column1, a.column2, a.column3
     , a.rec_no, a.column5
  from yourtable as a 
 inner join
       ( select column1, column2, column3
              , max(rec_no) as maxrec_no
           from yourtable
       group by column1, column2, column3
       ) as b
    on ( a.column1 = b.column1 
     AND a.column2 = b.column2 
     AND a.column3 = b.column3 
     And a.rec_no = b.maxrec_no )

Thanks....


    Requires Free Membership to View

    By submitting your registration information to SearchOracle.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchOracle.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

For comparison, here's the solution from that previous answer --

select colum1, colum2, colum3
     , rec_no, colum5 
  from yourTable XX
 where rec_no = 
       ( select max(rec_no)
           from yourtable
          where colum1 = XX.colum1
            and colum2 = XX.colum2
            and colum3 = XX.colum3
       )

Performance depends on so many things that I'm going to wimp out and just say YMMV -- your mileage may vary. If the optimizer shows the same execution plan for both queries now, it may or may not execute them the same on larger table sizes. It depends on what goes on "under the covers" when the database decides what order to do stuff in. I'm sorry I cannot offer a more technical explanation.

I am reassured that you got the same results from each query. There are many ways of optimizing database performance, and experimenting with different query structures is not one of the more fruitful. I do not claim to understand how optimizers actually execute correlated subqueries, or inner joins against a grouped subset of columns. I am reassured further that you got the same results in what looks to have been the same way "under the covers."

You might be able to improve the performance -- perhaps dramatically so -- of one or both queries by the creation of an index on column1, column2, column3, and rec_no, because then the subquery with the GROUP BY needn't retrieve rows, it can look at only the index. The index would have to contain all four columns, I think, to be of any use.


This was first published in March 2002

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.