Q

Top N subcategories, Part 3

I want a query to summarize categories of information so that the top N subcategories within each major category

are displayed. All category codes and associated elements are in the same table.

In Part 1 and Part 2 of this answer, we created and loaded the category table, and developed two queries for displaying the category hierarchy. This part of the answer discusses the actual "top N" query. For the sake of example, let's let N=3.

Basically, all we need to do is add a WHERE condition to the join query developed in Part 2. This condition involves using a correlated subquery which selects the top 3 subcategories that have the same parent. That's the key to solving this problem -- the TOP function needs to operate only on the subcategories that belong to the same parent category. Note that TOP is a proprietary extension to SQL that works in SQL Server and Access; other databases have other methods for this.

 select cat.id , cat.descr , subcat.id , subcat.descr , subcat.salesamt from categories cat left outer join categories subcat on cat.id = subcat.parentid where cat.parentid is null and subcat.id in ( select top 3 id from categories where parentid = cat.id order by salesamt desc ) order by cat.id , subcat.salesamt desc
id descr id descr salesamt ------ --------------- ------ --------------- ----------- 1 Category A 8 subcat A4 255 1 Category A 6 subcat A2 207 1 Category A 9 subcat A5 159 2 Category B 14 subcat B3 377 2 Category B 12 subcat B5 315 2 Category B 15 subcat B2 315 4 Category D 11 subcat D2 195 4 Category D 10 subcat D1 150

Notice that categories A and B each have 3 subcategories, as expected, while category C is not in the result set, and category D has only two.

But is this really the right answer? I would say no. Look at this query:

 select cat.id , cat.descr , subcat.id , subcat.descr , subcat.salesamt from categories cat left outer join categories subcat on cat.id = subcat.parentid where cat.parentid is null and subcat.id in ( select top 3 with ties id from categories where parentid = cat.id order by salesamt desc ) order by cat.id , subcat.salesamt desc
id descr id descr salesamt ------ --------------- ------ --------------- ----------- 1 Category A 8 subcat A4 255 1 Category A 6 subcat A2 207 1 Category A 9 subcat A5 159 2 Category B 14 subcat B3 377 2 Category B 12 subcat B5 315 2 Category B 15 subcat B2 315 2 Category B 16 subcat B1 315 4 Category D 11 subcat D2 195 4 Category D 10 subcat D1 150

The difference here is that category B now shows 4 subcategories in the top 3. In my opinion this is the correct answer because there's no compelling reason to discard any one of B5, B2, or B1 -- they're all tied for 2nd place!

See Part 4 for Oracle and generic solutions.


This was first published in October 2003

Dig deeper on Oracle and SQL

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close