Compare last year sales to this year sales
I have a sales table and want a way to compare 2004 sales versus 2005 sales in one SQL query. The table is:
table SALES (Distributor_ID number, product_type char, product_line char, sales_year date, quantity_sold number)
Want to show output broken down by distributor, product_type, product_line. Example (Distributor_id 1):
Dist Type Line Sold2004 Sold2005 1 A X 0 5 1 B Y 2 3 1 C Z 3 0
Your sales_year column is a DATE datatype? Are you sure? <grin />
For simplicity, so as not to clutter the SQL with DATEPART functions to extract the year, let's just assume your sales_year column is actually one of the integer datatypes. If it's truly a DATE datatype, the following SQL will need to be tweaked slightly (each database system has totally different date functions, but they all handle integers the same way).
One way to approach this is to use derived tables to summarize the sales by each year. Then you can simply join the two derived tables by year:
select D1.Dist , D1.Type , D1.Line , D1.Sold2004 , D2.Sold2005 from ( select Distributor_ID as Dist , product_type as Type , product_line as Line , sum(quantity_sold) as Sold2004 from SALES where sales_year = 2004 group by Distributor_ID , product_type , product_line ) as D1 full outer join ( select Distributor_ID as Dist , product_type as Type , product_line as Line , sum(quantity_sold) as Sold2005 from SALES where sales_year = 2005 group by Distributor_ID , product_type , product_line ) as D2 on D1.Distributor_ID = D2.Distributor_ID and D1.Type = D2.Type and D1.Line = D2.Line
The "gotcha" here is that it must be a FULL OUTER JOIN. You could have discontinued products with sales in 2004 but no sales in 2005. You could have brand new products with no sales in 2004 but sales in 2005.
Guess what's going to happen if you do it this way, though. The day this report goes out the door, your client's going to ask you to compare 2003, 2004, and 2005 side by side, and then your FULL OUTER JOIN suddenly becomes a whole lot messier. You could end up with partial results for 2003 and 2004, separate rows for 2003 and 2005, and so on. You'll be tempted just to slap another aggregate query around it all with SUMs and another GROUP BY.
There is an easier way.
select Distributor_ID as Dist , product_type as Type , product_line as Line , sum( case when sales_year = 2003 then quantity_sold else 0 end ) as Sold2003 , sum( case when sales_year = 2004 then quantity_sold else 0 end ) as Sold2004 , sum( case when sales_year = 2005 then quantity_sold else 0 end ) as Sold2005 from SALES group by Distributor_ID , product_type , product_line
So much easier. Some SQL authors will leave out the ELSE 0, which means that in some instances the CASE value will be NULL. But remember, aggregate functions like SUM ignore NULLs, so this is fairly safe to do. However, my preference is to code ELSE 0 anyway.