Q

Counting rows in runs

I have a table with fields MarketId and XY, where MarketID is a unique character field and for this example XY can only contain "X" or "Y." I need to count the occurrences of field XY and end up with a result like this.

If I have a table with fields MarketId and XY, where MarketID is a unique character field and for this example XY can only contain "X" or "Y." My example data is this:

MarketID  XY
--------  --
1         X
2         X
3         X
4         Y
5         Y
6         X
7         X

What I need to do is count the occurrences of field XY but in MarketId order to end up with a result something like this:

XY    Count
--    -----
X       3
Y       2
X       2

Thanks for your help.

These are called "runs" or "sequences" and the SQL can sure be tricky.

In this table, the MarketID column provides the inherent ordering of data. Image the IDs as points on a line:

------o------o------o------o------

Now we will self-join this table:

  from Market_XY as t1
inner
  join Market_XY as t3
    on t3.MarketID > t1.MarketID
   and t3.XY = t1.XY 

We will pair up each row (t1) to all other rows (t3) that have a higher ID as well as the same XY. The same XY means these two rows are a potential run:

------o------o------o------o------
      |      |      |      |
      t1     t3---->t3---->t3---->

Next, we self-join once more, but with a LEFT OUTER JOIN, using an IS NULL condition in the WHERE clause to make sure no matches are found:

  from Market_XY as t1
inner
  join Market_XY as t3
    on t3.MarketID > t1.MarketID
   and t3.XY = t1.XY
left outer
  join Market_XY as t2
    on t2.MarketID between t1.MarketID and t3.MarketID
   and t2.XY <> t3.XY
 where t2.MarketID is null

Essentially, we want all t1-t3 runs, where there is no t2 between them with a different XY. The LEFT OUTER JOIN looks for them, but the IS NULL keeps only the t1-t3 pairs where no t2 exists.

But obviously there will be overlaps of runs. In the given sample data, notice that the run (1,2,3) of Xs, has runs (1,2) and (2,3) inside it. What remains now is simply to take the longest runs. This involves careful grouping and the use of both MIN() and MAX().

select XY
     , hiID - min(loID) + 1 as Count
  from (
       select t1.MarketID as loID
            , t1.XY
            , max(t3.MarketID) as hiID
         from Market_XY as t1
       inner
         join Market_XY as t3
           on t3.MarketID > t1.MarketID
          and t3.XY = t1.XY
       left outer
         join Market_XY as t2
           on t2.MarketID
              between t1.MarketID
                  and t3.MarketID
          and t2.XY <> t3.XY
        where t2.MarketID is null
       group
           by t1.MarketID
            , t1.XY
       ) as hiIDs
group
    by XY
     , hiID
order
    by hiID

It looks tricky but it really isn't. To understand it, run the subquery alone, without the MAX() or GROUP BY, but showing the t3 columns. Then see what happens with the MAX() and GROUP BY. Then apply the outer query with its GROUP BY and MIN(). Note that MAX - MIN + 1 gives the count for the run.

P.S. If anyone has an analytic SQL solution for this, please do send it in. I would love to see it.

This was first published in March 2007

Dig deeper on Oracle and SQL

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close