I am struggling with writing a query to count consecutive years from the current year. I do not want to use a cursor. Our data looks something like this:
ID DateCol 1 02/01/2006 1 01/01/2006 1 01/01/2005 1 01/01/2004 1 01/01/1999 2 02/01/2006 2 01/01/2005 3 04/01/2006 3 04/01/1999 4 06/30/2000 4 08/01/1999
My desired output would be something like this:
ID ConYears 1 3 2 2 3 1
There is no requirement for the dates to be 365 days apart, just that they have different years. Any thoughts would be appreciated.
What a nice problem. There are probably ways to do it with just the single table, using a self-join technique similar to the one described in Gaps in sequences (9 February 2004). However, this could get complicated, because you said "consecutive years from the current year" and the first gap could be the current year, which the self-join wouldn't know of.
Once again we can use the handy-dandy integers table to help. If you don't have an integers table, you should; it's small, efficient and very useful.
create table integers (i integer not null ) insert into integers values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
The integers table can then be used to generate the last 10 years:
select year(current_date) - i as yr from integers yr 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997
The solution uses a CROSS JOIN and a LEFT OUTER JOIN. The CROSS JOIN creates all possible combinations of ID and year. Then the LEFT OUTER JOIN attempts to match each such ID and year combination to a row in the data table.
select X.ID , max(X.yr) as FirstMissing , year(current_date) -max(X.yr) as ConYears from ( select ID , year(current_date) - i as yr from integers cross join ( select distinct ID from datatable ) as I ) as X left outer join datatable as T on T.ID = X.ID and year(T.DateCol) = X.yr where T.ID is null group by X.ID having year(current_date) -max(X.yr) > 0
The derived table called X contains each combination of ID and year. This is the left table in the outer join, and it is joined to the data table, such that it matches the ID and year of the data. Note that it doesn't matter if more than one row of the data table matches, as is the case in your original data for ID 1 and year 2006.
Where a matching row is not found, using the IS NULL condition in the WHERE clause, that combination of ID and year is retained (matching rows are filtered out), and then, using a GROUP BY, only the maximum year which was not found for each ID is chosen, and the number of consecutive years calculated for each ID. Finally, the HAVING clause rejects any IDs like 4 which had 0 consecutive years from the current year.
ID FirstMissing ConYears 1 2003 3 2 2004 2 3 2005 1
Pretty neat, eh?
This was first published in January 2006