As most database admins and developers know, SELECT DISTINCT queries are quite resource-intensive, especially if you are querying a large table. Even if there are only three or four distinct values in the table, DBMS's such as SQL Server must still search the entire table to get those.
I was faced with this challenge recently. I needed to query distinct values on a column in a huge table, and I knew that all values would be represented within the first thousand records. Therefore, I thought I could quickly retrieve the data I needed by running this statement:
SELECT DISTINCT TOP 1000 Type FROM BL_Print2
Yet even this statement, limited to 1000 rows, took an unacceptably long time to complete. More experimentation. Finally, I experimented with using a derived table to preselect a sampling of the data, and running my SELECT DISTINCT query on that derived table (a derived table is simply a SELECT subquery run within a T-SQL batch which creates a recordset from which the outer query selects). Bingo! The query completed in only two seconds:
SELECT DISTINCT Type FROM (SELECT TOP 1000 type FROM bl_print2) t
Here's how it works: SQL Server, for example, processes the inner query first, which returns the first 1000 rows of data. The outer query then runs the SELECT DISTINCT statement against this derived dataset, and since it only needs to search 1000 rows rather than the entire table, results are returned rapidly.
For More Information
- The Best SQL Web Links: tips, tutorials, scripts, and more.
- Have an SQL tip or code snippet to offer your fellow DBA's? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical SQL questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts: Our SQL, database design, Oracle, DB2, and SQL Server gurus are waiting to answer your toughest questions.