Problem solve Get help with specific problems with your technologies, process and projects.

Predicting query performance on larger data sets

Regarding the Extracting the highest record from a group tip March 13 2002, I approached the problem a little differently, but got the same results.

After looking at the estimated execution plan for both queries, the work distribution was exactly the same (77% scan, 23% sort), which I assume is because the size of the recordset is so small. My question is: do you think there would be a significant performance difference between the approaches when using a much larger data set? The SQL I used was:

select a.column1, a.column2, a.column3
     , a.rec_no, a.column5
  from yourtable as a 
 inner join
       ( select column1, column2, column3
              , max(rec_no) as maxrec_no
           from yourtable
       group by column1, column2, column3
       ) as b
    on ( a.column1 = b.column1 
     AND a.column2 = b.column2 
     AND a.column3 = b.column3 
     And a.rec_no = b.maxrec_no )


For comparison, here's the solution from that previous answer --

select colum1, colum2, colum3
     , rec_no, colum5 
  from yourTable XX
 where rec_no = 
       ( select max(rec_no)
           from yourtable
          where colum1 = XX.colum1
            and colum2 = XX.colum2
            and colum3 = XX.colum3

Performance depends on so many things that I'm going to wimp out and just say YMMV -- your mileage may vary. If the optimizer shows the same execution plan for both queries now, it may or may not execute them the same on larger table sizes. It depends on what goes on "under the covers" when the database decides what order to do stuff in. I'm sorry I cannot offer a more technical explanation.

I am reassured that you got the same results from each query. There are many ways of optimizing database performance, and experimenting with different query structures is not one of the more fruitful. I do not claim to understand how optimizers actually execute correlated subqueries, or inner joins against a grouped subset of columns. I am reassured further that you got the same results in what looks to have been the same way "under the covers."

You might be able to improve the performance -- perhaps dramatically so -- of one or both queries by the creation of an index on column1, column2, column3, and rec_no, because then the subquery with the GROUP BY needn't retrieve rows, it can look at only the index. The index would have to contain all four columns, I think, to be of any use.

Dig Deeper on Oracle and SQL

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.