Q

Averages over a span of years -- Part 1

For the following sample relation:

subject | year | enrolled ----------+---------+------------- subject1 | 1998 | 20 subject1 | 1999 | 23 subject1 | 2000 | 16 subject2 | 1999 | 10 subject2 | 2000 | 21 subject3 | 2000 | 9

How would I create a query that calculates the average enrollment for each subject over the years? Thanks!


The answer depends on what is meant by an average "over the years."

Here's a solution involving a straightforward average calculation, using the AVG function:

select subject , avg(enrolled) as avgamt from subjects group by subject
subject avgamt subject1 19.67 subject2 15.50 subject3 9.00

Everything looks okay, right? Each subject has one or more entries in the table, and the average was calculated as the sum per subject divided by the number of rows, right?

But what if the average needs to be calculated over all years in the span of years from 1998 to 2000? How do we deal with the fact that some subjects are missing some years?

What we could do is supply the missing years for each subject. There's more than one way to do this, but here's a simple one. The following query uses the integers table (described in Finding all the dates between two dates, 10 June 2002, and also in Aggregates for date ranges, 4 October 2002). The integers table is joined with the original table in a cross join to generate the desired range of years for each subject:

select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2
subject theyear subject1 1998 subject1 1999 subject1 2000 subject2 1998 subject2 1999 subject2 2000 subject3 1998 subject3 1999 subject3 2000

How did we know to use "1998+i" and "i between 0 and 2" in this query? By inspection. Actually, in the general case, inspection would not be used, and instead, additional subqueries would obtain the lowest and highest years from the sample data.

We can now use the results of this cross join as a derived table and join it to the original table. We want to use a left outer join, since we know some rows will not match:

select allyears.subject , allyears.theyear , enrolled from ( select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2 ) as allyears left outer join subjects on allyears.subject = subjects.subject and allyears.theyear = subjects.theyear order by allyears.subject , allyears.theyear
subject theyear enrolled subject1 1998 20 subject1 1999 23 subject1 2000 16 subject2 1998 - subject2 1999 10 subject2 2000 21 subject3 1998 - subject3 1999 - subject3 2000 9

Okay, that looks fine. So let's try the averages again:

select allyears.subject , avg(enrolled) as avgamt from ( select distinct subject , 1998+i as theyear from integers , subjects where i between 0 and 2 ) as allyears left outer join subjects on allyears.subject = subjects.subject and allyears.theyear = subjects.theyear group by allyears.subject
subject avgamt subject1 19.67 subject2 15.50 subject3 9.00

Uh oh. These are our original results. How can this be?

The explanation is that aggregate functions exclude NULLs. Please see Part 2 of this answer for more information on working with NULLs and aggregates.


This was first published in November 2002

Dig deeper on Oracle and SQL

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close