Q
Problem solve Get help with specific problems with your technologies, process and projects.

Optimize SQL query

How can I rewrite the following SQL query in an efficient manner?

How can I rewrite the following in an efficient manner?

SELECT DISTINCT(store_num),
(SELECT COUNT(SUBSTR(tran_pd,7,1)) FROM store_sales s WHERE SUBSTR(tran_pd,7,1) = 1 AND store = s.store) AS WEEK1,
(SELECT COUNT(SUBSTR(tran_pd,7,1)) FROM store_sales s WHERE SUBSTR(tran_pd,7,1) = 2 AND store = s.store) AS WEEK2,
(SELECT COUNT(SUBSTR(tran_pd,7,1)) FROM store_sales s WHERE SUBSTR(tran_pd,7,1) = 3 AND store = s.store) AS WEEK3,
(SELECT COUNT(SUBSTR(tran_pd,7,1)) FROM store_sales s WHERE SUBSTR(tran_pd,7,1) = 4 AND store = s.store) AS WEEK4,
(SELECT COUNT(SUBSTR(tran_pd,7,1)) FROM store_sales s WHERE SUBSTR(tran_pd,7,1) = 5 AND store = s.store) AS WEEK5
FROM store_sales GROUP BY store_num

I was hoping there is a SQL function out there that could do this but could not find any.

To begin, we must remove the parentheses from around store_num in the SELECT list. Please see the article DISTINCT is not a function (26 September 2005). DISTINCT always applies to all columns selected.

Next, notice that there is a GROUP BY on store_num. This should prompt us to question whether the DISTINCT is really needed. We must therefore do a quick evaluation of all the columns in the SELECT list. Notice there are five almost identical subqueries in the SELECT. Now, a subquery is allowed to be written in the SELECT only if it returns a single (scalar) value (hence, this type of subquery is called a scalar subquery). But more importantly, these five subqueries are correlated subqueries, and each of them will return its value, a count, based on which store it is. Thus, for the same store, the subquery always returns the same count. Therefore, all the rows produced by the query will be unique, and thus either the DISTINCT or the GROUP BY can be removed.

Hey, wait a second! What are those subqueries actually counting? Answer: the number of transactions for each of five weeks. Does this mean that for every store, five different queries against the entire transaction table are executed? Answer: yes. Can this be simplified? Answer: yes.

select store_num
     , substr(tran_pd,7,1) as weekno
     , count(*)            as trans
  from store_sales
 where substr(tran_pd,7,1)
       between 1 and 5
group
    by store_num
     , substr(tran_pd,7,1)

This query produces the same counts as your original query. The WHERE clause can be eliminated if 1 through 5 are the only possible weeks. Let's call this our grouping query. The layout of the result set will look like this:

 store_num  weekno  trans
    105        1       9
    105        2      37
    105        3       4
    105        4      42
    105        5      11
    107        1      21
    107        2       5
    107        3      16
    ...

But your original query produces output that looks like this:

 store_num  week1 week2 week3 week4 week5
    105        9    37     4    42    11
    107       21     5    16   ...

To produce the same layout, we query our grouping query like this:

select store_num
     , sum(case when weekno = 1
                then trans end) as week1
     , sum(case when weekno = 2
                then trans end) as week2
     , sum(case when weekno = 3
                then trans end) as week3
     , sum(case when weekno = 4
                then trans end) as week4
     , sum(case when weekno = 5
                then trans end) as week5
  from (
       select store_num
            , substr(tran_pd,7,1) as weekno
            , count(*)            as trans
         from store_sales
        where substr(tran_pd,7,1)
              between 1 and 5
       group
           by store_num
            , substr(tran_pd,7,1)
       ) as d
group
    by store_num

Here the subquery in the FROM clause is our grouping query. When a subquery is used in the FROM clause, it returns a result set of possibly (usually) many columns, and possibly (almost always) many rows, unlike a scalar subquery. A subquery in the FROM clause is often called an inline view or derived table.

The key to the efficiency of this query is that the counting and grouping of transactions is done in the inner query, the derived table. It is done once, efficiently. Then the outer query simply summarizes the counts in a "crosstab" aggregating query.

This was last published in November 2006

Dig Deeper on Oracle and SQL

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close