If I have a table with fields MarketId and XY, where MarketID is a unique character field and for this example XY can only contain "X" or "Y." My example data is this:
MarketID XY -------- -- 1 X 2 X 3 X 4 Y 5 Y 6 X 7 X
What I need to do is count the occurrences of field XY but in MarketId order to end up with a result something like this:
XY Count -- ----- X 3 Y 2 X 2
Thanks for your help.
These are called "runs" or "sequences" and the SQL can sure be tricky.
In this table, the MarketID column provides the inherent ordering of data. Image the IDs as points on a line:
Now we will self-join this table:
from Market_XY as t1 inner join Market_XY as t3 on t3.MarketID > t1.MarketID and t3.XY = t1.XY
We will pair up each row (t1) to all other rows (t3) that have a higher ID as well as the same XY. The same XY means these two rows are a potential run:
------o------o------o------o------ | | | | t1 t3---->t3---->t3---->
Next, we self-join once more, but with a LEFT OUTER JOIN, using an IS NULL condition in the WHERE clause to make sure no matches are found:
from Market_XY as t1 inner join Market_XY as t3 on t3.MarketID > t1.MarketID and t3.XY = t1.XY left outer join Market_XY as t2 on t2.MarketID between t1.MarketID and t3.MarketID and t2.XY <> t3.XY where t2.MarketID is null
Essentially, we want all t1-t3 runs, where there is no t2 between them with a different XY. The LEFT OUTER JOIN looks for them, but the IS NULL keeps only the t1-t3 pairs where no t2 exists.
But obviously there will be overlaps of runs. In the given sample data, notice that the run (1,2,3) of Xs, has runs (1,2) and (2,3) inside it. What remains now is simply to take the longest runs. This involves careful grouping and the use of both MIN() and MAX().
select XY , hiID - min(loID) + 1 as Count from ( select t1.MarketID as loID , t1.XY , max(t3.MarketID) as hiID from Market_XY as t1 inner join Market_XY as t3 on t3.MarketID > t1.MarketID and t3.XY = t1.XY left outer join Market_XY as t2 on t2.MarketID between t1.MarketID and t3.MarketID and t2.XY <> t3.XY where t2.MarketID is null group by t1.MarketID , t1.XY ) as hiIDs group by XY , hiID order by hiID
It looks tricky but it really isn't. To understand it, run the subquery alone, without the MAX() or GROUP BY, but showing the t3 columns. Then see what happens with the MAX() and GROUP BY. Then apply the outer query with its GROUP BY and MIN(). Note that MAX - MIN + 1 gives the count for the run.
P.S. If anyone has an analytic SQL solution for this, please do send it in. I would love to see it.
This was first published in March 2007