TOP n WITH TIES in MySQL
I have this query:
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
select state, count(*) as customers from customer group by state order by customers desc limit 2
This works fine, but what if the query returns two values, meaning two records have the same number in the database?
Assuming from the LIMIT clause that this is MySQL, your query will return the "top two" rows (top two states by number of customers), and you seem to be concerned about ties. Rightly so. In your specific example, LIMIT 2 returns the top two rows, so if there's a two-way tie for top, you're okay. But if there's a two-way tie for second place, you're not.
Consider a quiz which has thirteen questions. Joe gets all thirteen questions right, Mary gets only five questions right, and everybody else gets twelve right. Who are the top two finishers in this quiz?
Joe | 13 |
Tom | 12 |
Dick | 12 |
Harry | 12 |
Curly | 12 |
Larry | 12 |
Moe | 12 |
... | 12 |
Notice that everybody is on the list except Mary. (The list is actually a lot longer, but showing all of it would be silly, wouldn't it.)
If we announce the results as "Joe won first prize with a perfect 13 and Tom won second prize with a 12," we will surely hear from some other angry contestants. Rightly so. Perhaps the number of customers by state is not likely to yield as many ties, but we should still address the problem.
So what to do? We can arbitrarily limit the results to two, which is certainly safe if we were showing, say, population by state, where the likelihood of a tie is vanishingly small. My preference is to show the top two results, no matter how many rows have them. In other words, to include ties.
In SQL Server it is possible to say TOP n WITH TIES. End of problem.
In MySQL, at least in versions prior to 4.1 which introduced support for subqueries, you'll have to use LIMIT. Set it high enough so that you feel confident you will return more than the necessary number of rows, or leave LIMIT off altogether and return all 50 states. Then loop over the resulting rows with a scripting language and programmatically detect ties with current/previous logic.