Q
Problem solve Get help with specific problems with your technologies, process and projects.

# Summing quantities in gapless sequences

## Here's a tough one for our SQL expert: how to sum quantities in gapless sequences?

At an interview for a data warehousing position, they asked me to write a query to get the below result from given...

dataset:

```DATA SET:

SMID  CSID  PURDATE  PURQTY
----  ----  -------  ------
1      1     200501    10
1      1     200502    12
1      1     200503    9

1      1     200507    10
1      1     200508    8

1      2     200505    10
1      2     200506    15

RESULT OF QUERY SHOULD BE:

SMID  CSID  STARTDT  ENDDATE  QTY
----  ----  -------  -------  ----
1      1    200501   200503    31
1      1    200507   200508    18
1      2    200505   200506    25
```

Unfortunately I could not figure out the expected answer. Please, can you take a look at it?

Oh, that's tricky. That's a pretty tough problem to throw at somebody in an interview.

Obviously what they were after was an analysis involving gap-less sequences. There are two sequences for SMID=1 CSID=1, because of the gap between 200503 and 200507.

First, let's find the sequences. This is accomplished by looking for values that occur just preceding and just following a possible sequence. If there are none, then we have a sequence, although it may have gaps:

```select r1.SMID
, r1.CSID
, r1.PURDATE     as STARTDT
, r2.PURDATE     as ENDDATE
, ( select count(*)
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE
between r1.PURDATE
and r2.PURDATE ) as seq_count
, r2.PURDATE - r1.PURDATE  + 1  as seq_diff
from purchases as r1
inner
join purchases as r2
on r2.SMID = r1.SMID
and r2.CSID = r1.CSID
and r2.PURDATE > r1.PURDATE
and not exists
( select 1
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE IN
( r1.PURDATE - 1
, r2.PURDATE + 1 ) )```

The query joins the table to itself based on SMID and CSID, such that the r2 PURDATE value is greater than the r1 value. (Yes, you are allowed to write an INNER JOIN that does not use equality as the join condition.) The NOT EXISTS subquery stipulates that the preceding or following value for the same SMID and CSID must be missing. Thus r1 and r2 are the endpoints of a sequence.

This query produces the following results:

```SMID CSID STARTDT ENDDATE seq_count seq_diff
---- ---- ------- ------- --------- --------
1    1    200501  200503    3         3
1    1    200501  200508    5         8
1    1    200507  200508    2         2
1    2    200505  200506    2         2```

Check the STARTDT and ENDDATE values of each result row to verify that the NOT EXISTS condition has been satisfied.

Notice that the count of the number of values in the sequence has been calculated, as well as the difference between first and last value. You can see immediately that the result rows we are interested in are the ones where these calculations are equal, which means that there are no internal gaps. The range 200501-200508 will be dropped because the difference is 8 but the count is only 5, which means there is a gap.

So let's move those calculations to the WHERE clause, and then use the filtered result set, which now contains only gap-free sequences, as a derived table in a join back to the main data table, with GROUP BY to get the sum of the quantities.

```select gapfree.SMID
, gapfree.CSID
, gapfree.STARTDT
, gapfree.ENDDATE
, sum(data.PURQTY) as QTY
from (
select r1.SMID
, r1.CSID
, r1.PURDATE     as STARTDT
, r2.PURDATE     as ENDDATE
from purchases as r1
inner
join purchases as r2
on r2.SMID = r1.SMID
and r2.CSID = r1.CSID
and r2.PURDATE > r1.PURDATE
and not exists
( select 1
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE IN
( r1.PURDATE - 1
, r2.PURDATE + 1 ) )
and ( select count(*)
from purchases
where SMID = r1.SMID
and CSID = r1.CSID
and PURDATE
between r1.PURDATE
and r2.PURDATE )
= r2.PURDATE - r1.PURDATE  + 1
) as gapfree
inner
join purchases as data
on data.SMID = gapfree.SMID
and data.CSID = gapfree.CSID
and data.PURDATE
between gapfree.STARTDT
and gapfree.ENDDATE
group
by gapfree.SMID
, gapfree.CSID
, gapfree.STARTDT
, gapfree.ENDDATE```

Seems a lot to expect of someone in an interview. Are you sure this wasn't a homework question? <grin>

Does anyone have a solution involving analytic SQL?

This was last published in November 2007

## Content

Find more PRO+ content and other member only offers, here.

#### Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

#### Start the conversation

Send me notifications when other members comment.

## SearchDataManagement

• ### Connectedness is king, as Neo4j graph database ports to Spark

The Neo4j graph database emphasizes easy relationship mapping for diverse data points. Now, its related Cypher query language is ...

• ### NewSQL databases rise anew -- MemSQL, Spanner among contenders

The NewSQL database was almost hidden when Hadoop and NoSQL arose. Now, as more big data teams move toward production uses, ...

• ### Good data quality for analytics becomes an IT imperative

High-quality data is a must for analytics applications. That's driving more demand for data quality tools, but quality ...

• ### Tableau targets data preparation software with Project Maestro

As Tableau and other high-level applications look to automate more functionality, stand-alone data preparation tools have to ...

• ### AI components make tools more than the sum of their parts

AI applications, rather than being one monolithic tool, are built around a diverse collection of tools and techniques that ...

• ### At AT&T, CDO responsibilities to include all things AI

At most companies, the chief data officer role tends to focus on data governance and management issues, but at AT&T, AI is set to...

## SearchSAP

• ### Responsible sourcing can be good for business

A company's reputation and bottom line can be damaged if its suppliers engage in harmful practices. Responsible sourcing and risk...

• ### SAP TechEd 2017 focuses on next-gen development tools

At SAP TechEd 2017, SAP rolled out some new developer tools that are intended to extend the SAP platform and drive development of...

• ### SAP Analytics Cloud helps paper-maker tell a good business story

SAP Analytics Cloud software is helping paper manufacturer Pratt Industries tell the story of monthly forecasts more accurately, ...

## SearchSQLServer

• ### Microsoft technology refresh touches SQL Server, integration tooling

Microsoft is at work on a delicate technology refresh affecting database tuning and architecture, as well as data integration and...

• ### Microsoft boosts SQL Server machine learning services

Python and R are among the tools in the SQL Server machine learning toolkit. Native T-SQL scoring is also on the agenda, as ...

• ### Power BI updates drive Microsoft's latest hybrid cloud efforts

At PASS Summit 2017, Microsoft Azure's strides were measured in steps. These include Power BI updates that bring cloud reporting ...

## TheServerSide.com

• ### Can DevOps problems actually cause projects to fail?

DevOps isn't perfect. There are times when DevOps problems can overwhelm the potential benefits. So, why do some DevOps projects ...

• ### Owning the Java Platform is more of a burden or a blessing

Oracle became stewards of the Java platform as a by-product of their acquisition of Sun Microsystems. But looking back, it seems ...

• ### Migrations to Oracle's Java SE 9 platform may be delayed

Oracle did a great job getting Java SE 9 released earlier this year, but modularity and various smaller updates may not be enough...

## SearchDataCenter

• ### Data center GPU use on the rise thanks to AI, big data

GPU vendors have added new devices and cards for data center servers, as data demanding workloads infiltrated the data center and...

• ### SD-WAN benefits branch networks with simplicity, automation

Traditional branch networks haven't adapted well to new technologies. But a mature SD-WAN market can bring distributed networks ...

• ### Composable infrastructure creates new path to SDDC nirvana

Shiny new products like composable infrastructure and on-premises cloud platforms could offer a way to achieve software-defined ...

## SearchContentManagement

• ### Q&A: New CEO bets on open source future for Acquia CMS

The Acquia CMS took the Red Hat model to content management by commercializing open source Drupal. What's next? We ask co-founder...

• ### CMS analytics arms businesses with a strategic planning edge

Content analytics for CMSes mines business value from free text in data lakes, so it's time to go prospecting for gold with this ...

• ### Enterprise content management systems boost intelligence

Content analytics moves beyond the tried-and-true web analytics style of insights, adding natural language processing and images ...

## SearchFinancialApplications

• ### Finance IT case study: Reporting secrets of Derek Rose

CEO Sacha Rose says specialist reporting tools have saved the company thousands by avoiding unnecessary mistakes.

• ### WestJet turns to gamification to help its Oracle ERP users soar

WestJet's initial gamification project focuses on expense reporting.

• ### The Transformation of HR is Underway

HR is being transformed while we watch.

Close