Top 3 counts, revisited

In Top 3 counts querying two tables (27 June 2005), there is a potential trap. Actually it can exist in any "top" type report. Let's say that you are going to pay commission bonuses to the top three sales performers this month out of 100 reps. If a query is written just to get the top three, you might find that Rep11 = $100,000, Rep13 = $98,500, Rep28 = $97,120 and Rep41 = $97,120. All of the other reps are below $97,000. However, just retrieving the top three will probably miss the fourth (equally deserving) rep. I have run into this in more than one "top rank" type report. It might be wise to check that no TIES exist. I believe that Oracle has a ties function that allows for just this circumstance. In MySQL (version 3.23.xx) I usually run a minimal test query with the limit, i.e. LIMIT 3, capture the lowest value, and then run the actual query using HAVING as the restriction. You have been very helpful to me in the past and I really appreciate your insights.

    Requires Free Membership to View

    By submitting your registration information to SearchOracle.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchOracle.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

You are right to be concerned about ties. However, the problem you mention happens only if you're using (one version of) TOP, or LIMIT. The queries given in my previous answer actually do take ties into consideration.

As far as using TOP or LIMIT is concerned, I'm not sure about Oracle, but in Microsoft Access, ties are automatically included in TOP queries, and in Microsoft SQL Server, you can say TOP n WITH TIES. In MySQL, LIMIT works strictly on rows, not on values. But all of this is proprietary syntax, and standard SQL would be much preferable.

More specifically, let's have a look at the generic top 10 query which was given in FIRST N rows, TOP N rows, LAST N rows, BOTTOM N rows...:

select foo
     , bar
  from yourtable X 
 where ( select count(*) 
           from yourtable  
          where foo > X.foo ) < 10

Let's see if we can adapt the generic SQL to your example. We'll add a couple more rows for demonstration purposes:

create table repsales
( repno varchar(7)
, sales integer
)
insert into repsales values ( 'Rep13',  98500 ) 
insert into repsales values ( 'Rep28',  97120 ) 
insert into repsales values ( 'Rep11', 100000 ) 
insert into repsales values ( 'Rep41',  97120 ) 
insert into repsales values ( 'Rep9',   90937 )
insert into repsales values ( 'Rep37',  91937 )

Now let's try the generic query:

select repno
     , sales
  from repsales X 
 where ( select count(*) 
           from repsales  
          where sales > X.sales ) < 3
order
    by sales desc 

Results:

repno   sales       
------- ----------- 
Rep11   100000
Rep13   98500
Rep28   97120
Rep41   97120

Ties are included, as expected.

By the way, you probably already know this, but you cannot use subqueries in MySQL versions before 4.1. (Version 3.23 is positively ancient.) So you're stuck using LIMIT.

However, your "workaround" to run a test query to look at the third highest number is actually an excellent strategy. You may just need to do it a few times. An example of this is given in TOP n WITH TIES in MySQL (3 June 2004).

This was first published in September 2005

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.