Requires Free Membership to View
What is the best solution? An interesting question. My rules for a good solution are:
The query produces the right answer.
This is often quite difficult to confirm just by looking at the answer. However, you must meet this criterion. A wrong answer means the query is a bad solution, one hundred percent bad.
To obtain the right answer, a query development strategy is required. Run subqueries as simple standalone queries to show intermediate results. Keep a log or audit trail of queries and results, building up to more complex queries. Show details for GROUP BY queries, and display keys to ensure joins are working correctly, especially outer joins. Use comprehensive test data encompassing missing rows, extra rows, NULLs, etc.
The query finishes.
We've all submitted a query which turns the little hourglass icon on the computer screen into a calendar. Some of us have even routinely submitted one as the last task before we lock the keyboard and go home, so that it may run overnight and produce results by morning.
Make sure the query is efficient. Declare column indexes as required, and try a few different ways of writing the query. Sometimes subqueries can be written as joins and vice versa. Use database facilities such as EXPLAIN or SHOW PLAN to understand how the query is being executed.
The query is easy to understand.
This is subjective, but not as much as you might think. Shorter, simpler queries are better. A complex query can be difficult to maintain, should changes prove necessary. Remember, it may be you six months down the road who needs to understand it.
That said, here's a query to select customers who have placed an order every year for the last 5 years. You be the judge of how "good" it is:
select customername
from customers
inner
join orders
on customer.id = orders.customerid
where orderdate
between current_date - interval '5' year
and current_date
group
by customername
having count(distinct year(orderdate)) >= 5
The HAVING clause ensures that the number of different years is greater than or equal to 5, and not just equal to 5, because it could be 6. To me, the reason why it could be 6 is obvious, but only because the query is simple and uncluttered. Going back an interval of 5 years from today, a day in February 2004, puts us in February, 1999. That 5 year interval therefore spans six different calendar years. A customer could have an order in late 1999, 2000, 2001, 2002, 2003, and early 2004. This would not have been at all obvious (breaks Rule 3) if the query had said greater than 4 which produces the same result (meets Rule 1), i.e. selects customers with orders in either 5 or 6 different years.
Your requirement for customers who "placed orders every year" might seem to imply more than one order in each year, but the above query assumes (thereby possibly breaking Rule 1) that at least one order in each calendar year is sufficient. It's actually possible for a customer to have five orders in that five year interval and no two orders within a year of each other; however, it is obvious that the query is not doing anything about such nuances. The nicest thing about simplicity is how easily you can see the complexity that isn't there.
This was first published in February 2004
Join the conversationComment
Share
Comments
Results
Contribute to the conversation