Problem solve Get help with specific problems with your technologies, process and projects.

Customers with orders for the last N years

I want to see your solution to this problem. There are two tables, a customer and an order table, with one-to-many relationship between customers and orders. The order table has a date. I need to find all customers that placed orders every year for the last N years. I have a few solutions, but what is the best? What would you do?

What is the best solution? An interesting question. My rules for a good solution are:

  1. The query produces the right answer.

    This is often quite difficult to confirm just by looking at the answer. However, you must meet this criterion. A wrong answer means the query is a bad solution, one hundred percent bad.

    To obtain the right answer, a query development strategy is required. Run subqueries as simple standalone queries to show intermediate results. Keep a log or audit trail of queries and results, building up to more complex queries. Show details for GROUP BY queries, and display keys to ensure joins are working correctly, especially outer joins. Use comprehensive test data encompassing missing rows, extra rows, NULLs, etc.

  2. The query finishes.

    We've all submitted a query which turns the little hourglass icon on the computer screen into a calendar. Some of us have even routinely submitted one as the last task before we lock the keyboard and go home, so that it may run overnight and produce results by morning.

    Make sure the query is efficient. Declare column indexes as required, and try a few different ways of writing the query. Sometimes subqueries can be written as joins and vice versa. Use database facilities such as EXPLAIN or SHOW PLAN to understand how the query is being executed.

  3. The query is easy to understand.

    This is subjective, but not as much as you might think. Shorter, simpler queries are better. A complex query can be difficult to maintain, should changes prove necessary. Remember, it may be you six months down the road who needs to understand it.

That said, here's a query to select customers who have placed an order every year for the last 5 years. You be the judge of how "good" it is:

select customername
  from customers
  join orders
    on customer.id = orders.customerid
 where orderdate 
       between current_date - interval '5' year
           and current_date     
    by customername
having count(distinct year(orderdate)) >= 5 

The HAVING clause ensures that the number of different years is greater than or equal to 5, and not just equal to 5, because it could be 6. To me, the reason why it could be 6 is obvious, but only because the query is simple and uncluttered. Going back an interval of 5 years from today, a day in February 2004, puts us in February, 1999. That 5 year interval therefore spans six different calendar years. A customer could have an order in late 1999, 2000, 2001, 2002, 2003, and early 2004. This would not have been at all obvious (breaks Rule 3) if the query had said greater than 4 which produces the same result (meets Rule 1), i.e. selects customers with orders in either 5 or 6 different years.

Your requirement for customers who "placed orders every year" might seem to imply more than one order in each year, but the above query assumes (thereby possibly breaking Rule 1) that at least one order in each calendar year is sufficient. It's actually possible for a customer to have five orders in that five year interval and no two orders within a year of each other; however, it is obvious that the query is not doing anything about such nuances. The nicest thing about simplicity is how easily you can see the complexity that isn't there.

Dig Deeper on Oracle and SQL

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.