In the following query, what is the sequence of steps in which the query gets executed?
select min(price) from items_ordered group by item having item='Tent'
How is the query executed at the back-end by the database engine?
The first step is always the FROM clause. In your case, this is pretty straight-forward, because there's only one table, and there aren't any complicated joins to worry about. In a query with joins, these are evaluated in this first step. The joins are assembled to decide which rows to retrieve, with the ON clause conditions being the criteria for deciding which rows to join from each table. The result of the FROM clause is an intermediate result. You could think of this as a temporary table, consisting of combined rows which satisfy all the join conditions. (In your case the temporary table isn't actually built, because the optimizer knows it can just access your table directly without joining to any others.)
The next step is the WHERE clause. In your case, you don't have one, so the optimizer knows that it must retrieve all the rows from the intermediate result. In a query with a WHERE clause, each row in the intermediate result is evaluated according to the WHERE conditions, and either discarded or retained.
Next comes the GROUP BY. If there's a GROUP BY clause, the intermediate result is now partitioned into groups, one group for every combination of values in the columns in the GROUP BY clause. In your case, all the rows in each group have the same item. You can think of the partitioning as a sort, although a sort may not actually be involved if the intermediate result is already in item sequence (which happens when the rows retrieved by the FROM clause are accessed in item sequence via an index).
Now comes the HAVING clause. The HAVING clause operates once on each group, and all rows from groups which do not satisfy the HAVING clause are eliminated. In your case, something very interesting happens here. After having laboriously assembled (and possibly sorted) an entire intermediate result table, the optimizer now throws away all rows in every group except the group where the item is 'Tent' (a word on this later). In any case, after the HAVING clause has filtered the groups, a new intermediate result set is produced, and in this new intermediate result, there is only one row per group. This single row per group is allowed to have only the following types of columns:
- any of the columns in the GROUP BY clause
- aggregate functions operating on any other columns in the first intermediate result
Next comes the SELECT. From the rows of the new intermediate result produced by the GROUP BY and HAVING clauses, the SELECT now assembles the columns it needs. In particular, none of the columns in the original tables are available unless they are in the GROUP BY clause. In your case, you are selecting min(price), which, since it is an aggregate function, is a valid column coming from the GROUP BY and HAVING process.
Finally, the last step is the ORDER BY clause. In your case, there isn't one. In queries with both a GROUP BY and ORDER BY clause, you can reference columns in the ORDER BY only if they are in the new intermediate result produced by the grouping process, i.e. columns in the GROUP BY or aggregate functions (or expressions based on these).
Remember the interesting result I mentioned earlier? What I'd like you to do is run through these steps again, but this time, imagine that your HAVING condition (item='Tent') were written as a WHERE condition instead, and see if that might make a difference to what happens as the query is executed.
This was first published in September 2006