Q
Problem solve Get help with specific problems with your technologies, process and projects.

Summing quantities in gapless sequences

Here's a tough one for our SQL expert: how to sum quantities in gapless sequences?

At an interview for a data warehousing position, they asked me to write a query to get the below result from given...

dataset:

DATA SET:

SMID  CSID  PURDATE  PURQTY
----  ----  -------  ------
1      1     200501    10
1      1     200502    12
1      1     200503    9

1      1     200507    10
1      1     200508    8

1      2     200505    10
1      2     200506    15

RESULT OF QUERY SHOULD BE:

SMID  CSID  STARTDT  ENDDATE  QTY
----  ----  -------  -------  ----
1      1    200501   200503    31
1      1    200507   200508    18
1      2    200505   200506    25

Unfortunately I could not figure out the expected answer. Please, can you take a look at it?


Oh, that's tricky. That's a pretty tough problem to throw at somebody in an interview.

Obviously what they were after was an analysis involving gap-less sequences. There are two sequences for SMID=1 CSID=1, because of the gap between 200503 and 200507.

First, let's find the sequences. This is accomplished by looking for values that occur just preceding and just following a possible sequence. If there are none, then we have a sequence, although it may have gaps:

select r1.SMID
     , r1.CSID
     , r1.PURDATE     as STARTDT
     , r2.PURDATE     as ENDDATE
     , ( select count(*) 
           from purchases
          where SMID = r1.SMID 
            and CSID = r1.CSID
            and PURDATE 
                between r1.PURDATE 
                    and r2.PURDATE ) as seq_count
     , r2.PURDATE - r1.PURDATE  + 1  as seq_diff
  from purchases as r1
inner
  join purchases as r2
    on r2.SMID = r1.SMID
   and r2.CSID = r1.CSID
   and r2.PURDATE > r1.PURDATE
   and not exists
       ( select 1 
           from purchases
          where SMID = r1.SMID
            and CSID = r1.CSID
            and PURDATE IN
                 ( r1.PURDATE - 1
                 , r2.PURDATE + 1 ) )

The query joins the table to itself based on SMID and CSID, such that the r2 PURDATE value is greater than the r1 value. (Yes, you are allowed to write an INNER JOIN that does not use equality as the join condition.) The NOT EXISTS subquery stipulates that the preceding or following value for the same SMID and CSID must be missing. Thus r1 and r2 are the endpoints of a sequence.

This query produces the following results:

SMID CSID STARTDT ENDDATE seq_count seq_diff
---- ---- ------- ------- --------- --------
 1    1    200501  200503    3         3
 1    1    200501  200508    5         8
 1    1    200507  200508    2         2
 1    2    200505  200506    2         2

Check the STARTDT and ENDDATE values of each result row to verify that the NOT EXISTS condition has been satisfied.

Notice that the count of the number of values in the sequence has been calculated, as well as the difference between first and last value. You can see immediately that the result rows we are interested in are the ones where these calculations are equal, which means that there are no internal gaps. The range 200501-200508 will be dropped because the difference is 8 but the count is only 5, which means there is a gap.

So let's move those calculations to the WHERE clause, and then use the filtered result set, which now contains only gap-free sequences, as a derived table in a join back to the main data table, with GROUP BY to get the sum of the quantities.

select gapfree.SMID
     , gapfree.CSID
     , gapfree.STARTDT
     , gapfree.ENDDATE
     , sum(data.PURQTY) as QTY
  from (
       select r1.SMID
            , r1.CSID
            , r1.PURDATE     as STARTDT
            , r2.PURDATE     as ENDDATE
         from purchases as r1
       inner
         join purchases as r2
           on r2.SMID = r1.SMID
          and r2.CSID = r1.CSID
          and r2.PURDATE > r1.PURDATE
          and not exists
              ( select 1 
                  from purchases
                 where SMID = r1.SMID
                   and CSID = r1.CSID
                   and PURDATE IN
                        ( r1.PURDATE - 1
                        , r2.PURDATE + 1 ) )
          and ( select count(*) 
                  from purchases
                 where SMID = r1.SMID 
                   and CSID = r1.CSID
               and PURDATE 
                       between r1.PURDATE 
                           and r2.PURDATE ) 
            = r2.PURDATE - r1.PURDATE  + 1 
       ) as gapfree
inner
  join purchases as data
    on data.SMID = gapfree.SMID
   and data.CSID = gapfree.CSID
   and data.PURDATE 
       between gapfree.STARTDT
           and gapfree.ENDDATE
group
    by gapfree.SMID
     , gapfree.CSID
     , gapfree.STARTDT
     , gapfree.ENDDATE

Seems a lot to expect of someone in an interview. Are you sure this wasn't a homework question? <grin>

Does anyone have a solution involving analytic SQL?


This was last published in November 2007

Dig Deeper on Oracle and SQL

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close