To continue reading for free, register below or login
To read more you must become a member of SearchOracle.com
');
// -->

Oh, that's tricky. That's a pretty tough problem to throw
at somebody in an interview.
Obviously what they were after was an analysis involving
gap-less sequences. There are two sequences
for SMID=1 CSID=1, because of the gap between 200503 and 200507.
First, let's find the sequences. This is accomplished by looking
for values that occur just preceding and just following
a possible sequence. If there are none, then we have a sequence,
although it may have gaps:
select r1.SMID
, r1.CSID
, r1.PURDATE as STARTDT
, r2.PURDATE as ENDDATE
, ( select count(*)
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE
between r1.PURDATE
and r2.PURDATE ) as seq_count
, r2.PURDATE - r1.PURDATE + 1 as seq_diff
from purchases as r1
inner
join purchases as r2
on r2.SMID = r1.SMID
and r2.CSID = r1.CSID
and r2.PURDATE > r1.PURDATE
and not exists
( select 1
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE IN
( r1.PURDATE - 1
, r2.PURDATE + 1 ) )
The query joins the table to itself based on SMID and CSID,
such that the r2 PURDATE value is greater than the r1 value. (Yes,
you are allowed to write an INNER JOIN that does not use
equality as the join condition.) The NOT EXISTS subquery stipulates
that the preceding or following value for the same SMID and CSID must
be missing. Thus r1 and r2 are the endpoints of a sequence.
This query produces the following results:
SMID CSID STARTDT ENDDATE seq_count seq_diff
---- ---- ------- ------- --------- --------
1 1 200501 200503 3 3
1 1 200501 200508 5 8
1 1 200507 200508 2 2
1 2 200505 200506 2 2
Check the STARTDT and ENDDATE values of each result row to verify
that the NOT EXISTS condition has been satisfied.
Notice that the count of the number of values in the sequence has been
calculated, as well as the difference between first and last value.
You can see immediately that the result rows we are interested in are
the ones where these calculations are equal, which means that there
are no internal gaps. The range 200501-200508 will be dropped because the
difference is 8 but the count is only 5, which means there is a gap.
So let's move those calculations to the WHERE clause, and then use the
filtered result set, which now contains only gap-free sequences,
as a derived table in a join back to the main data table,
with GROUP BY to get the sum of the quantities.
select gapfree.SMID
, gapfree.CSID
, gapfree.STARTDT
, gapfree.ENDDATE
, sum(data.PURQTY) as QTY
from (
select r1.SMID
, r1.CSID
, r1.PURDATE as STARTDT
, r2.PURDATE as ENDDATE
from purchases as r1
inner
join purchases as r2
on r2.SMID = r1.SMID
and r2.CSID = r1.CSID
and r2.PURDATE > r1.PURDATE
and not exists
( select 1
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE IN
( r1.PURDATE - 1
, r2.PURDATE + 1 ) )
and ( select count(*)
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE
between r1.PURDATE
and r2.PURDATE )
= r2.PURDATE - r1.PURDATE + 1
) as gapfree
inner
join purchases as data
on data.SMID = gapfree.SMID
and data.CSID = gapfree.CSID
and data.PURDATE
between gapfree.STARTDT
and gapfree.ENDDATE
group
by gapfree.SMID
, gapfree.CSID
, gapfree.STARTDT
, gapfree.ENDDATE
Seems a lot to expect of someone in an interview. Are you sure this
wasn't a homework question? <grin>
Does anyone have a solution involving analytic SQL?
|