Ask the Expert

Averages over a span of years -- Part 1

For the following sample relation:

subject | year | enrolled ----------+---------+------------- subject1 | 1998 | 20 subject1 | 1999 | 23 subject1 | 2000 | 16 subject2 | 1999 | 10 subject2 | 2000 | 21 subject3 | 2000 | 9

How would I create a query that calculates the average enrollment for each subject over the years? Thanks!


    Requires Free Membership to View

The answer depends on what is meant by an average "over the years."

Here's a solution involving a straightforward average calculation, using the AVG function:

select subject , avg(enrolled) as avgamt from subjects group by subject
subject avgamt subject1 19.67 subject2 15.50 subject3 9.00

Everything looks okay, right? Each subject has one or more entries in the table, and the average was calculated as the sum per subject divided by the number of rows, right?

But what if the average needs to be calculated over all years in the span of years from 1998 to 2000? How do we deal with the fact that some subjects are missing some years?

What we could do is supply the missing years for each subject. There's more than one way to do this, but here's a simple one. The following query uses the integers table (described in Finding all the dates between two dates, 10 June 2002, and also in Aggregates for date ranges, 4 October 2002). The integers table is joined with the original table in a cross join to generate the desired range of years for each subject:

select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2
subject theyear subject1 1998 subject1 1999 subject1 2000 subject2 1998 subject2 1999 subject2 2000 subject3 1998 subject3 1999 subject3 2000

How did we know to use "1998+i" and "i between 0 and 2" in this query? By inspection. Actually, in the general case, inspection would not be used, and instead, additional subqueries would obtain the lowest and highest years from the sample data.

We can now use the results of this cross join as a derived table and join it to the original table. We want to use a left outer join, since we know some rows will not match:

select allyears.subject , allyears.theyear , enrolled from ( select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2 ) as allyears left outer join subjects on allyears.subject = subjects.subject and allyears.theyear = subjects.theyear order by allyears.subject , allyears.theyear
subject theyear enrolled subject1 1998 20 subject1 1999 23 subject1 2000 16 subject2 1998 - subject2 1999 10 subject2 2000 21 subject3 1998 - subject3 1999 - subject3 2000 9

Okay, that looks fine. So let's try the averages again:

select allyears.subject , avg(enrolled) as avgamt from ( select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2 ) as allyears left outer join subjects on allyears.subject = subjects.subject and allyears.theyear = subjects.theyear group by allyears.subject
subject avgamt subject1 19.67 subject2 15.50 subject3 9.00

Uh oh. These are our original results. How can this be?

The explanation is that aggregate functions exclude NULLs. Please see Part 2 of this answer for more information on working with NULLs and aggregates.


This was first published in November 2002

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: