If I have a table with fields MarketId and XY, where MarketID is a unique character field and for this example XY can only contain "X" or "Y." My example data is this:

MarketID  XY
--------  --
1         X
2         X
3         X
4         Y
5         Y
6         X
7         X

What I need to do is count the occurrences of field XY but in MarketId order to end up with a result something like this:

XY    Count
--    -----
X       3
Y       2
X       2

Thanks for your help.

    Requires Free Membership to View

These are called "runs" or "sequences" and the SQL can sure be tricky.

In this table, the MarketID column provides the inherent ordering of data. Image the IDs as points on a line:

------o------o------o------o------

Now we will self-join this table:

  from Market_XY as t1
inner
  join Market_XY as t3
    on t3.MarketID > t1.MarketID
   and t3.XY = t1.XY 

We will pair up each row (t1) to all other rows (t3) that have a higher ID as well as the same XY. The same XY means these two rows are a potential run:

------o------o------o------o------
      |      |      |      |
      t1     t3---->t3---->t3---->

Next, we self-join once more, but with a LEFT OUTER JOIN, using an IS NULL condition in the WHERE clause to make sure no matches are found:

  from Market_XY as t1
inner
  join Market_XY as t3
    on t3.MarketID > t1.MarketID
   and t3.XY = t1.XY
left outer
  join Market_XY as t2
    on t2.MarketID between t1.MarketID and t3.MarketID
   and t2.XY <> t3.XY
 where t2.MarketID is null

Essentially, we want all t1-t3 runs, where there is no t2 between them with a different XY. The LEFT OUTER JOIN looks for them, but the IS NULL keeps only the t1-t3 pairs where no t2 exists.

But obviously there will be overlaps of runs. In the given sample data, notice that the run (1,2,3) of Xs, has runs (1,2) and (2,3) inside it. What remains now is simply to take the longest runs. This involves careful grouping and the use of both MIN() and MAX().

select XY
     , hiID - min(loID) + 1 as Count
  from (
       select t1.MarketID as loID
            , t1.XY
            , max(t3.MarketID) as hiID
         from Market_XY as t1
       inner
         join Market_XY as t3
           on t3.MarketID > t1.MarketID
          and t3.XY = t1.XY
       left outer
         join Market_XY as t2
           on t2.MarketID
              between t1.MarketID
                  and t3.MarketID
          and t2.XY <> t3.XY
        where t2.MarketID is null
       group
           by t1.MarketID
            , t1.XY
       ) as hiIDs
group
    by XY
     , hiID
order
    by hiID

It looks tricky but it really isn't. To understand it, run the subquery alone, without the MAX() or GROUP BY, but showing the t3 columns. Then see what happens with the MAX() and GROUP BY. Then apply the outer query with its GROUP BY and MIN(). Note that MAX - MIN + 1 gives the count for the run.

P.S. If anyone has an analytic SQL solution for this, please do send it in. I would love to see it.

This was first published in March 2007

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: