|
Whenever a query involves a number of tables, I find it helpful
always to start writing the join with the most restrictive table first.
For example, if you are writing a query to retrieve all the books
from a specific publisher, the restriction in this query would be
the specific publisher. Think of it as the difference between:
versus
Does that make sense? See the difference?
Okay, so my rules for writing the FROM clause are:
Start with the most restrictive table first.
If the first table has any many-to-one relationships,
join those tables next. Specifically, for each row of
the first table, there is at most one row of the joined table(s).
If any of the tables thus far joined has any one-to-many relationships,
you can join only one of these. Otherwise, if you
join more than one one-to-many table, you get cross
join effects.
Continue by joining many-to-one tables as necessary.
So, in your query, the table you want to start with is books.
Why is this most restrictive? Because you want only co-authored books,
and there are many fewer co-authored books than those written by a
single author.
Next, join to publishers, because a given book has just one publisher.
(If a book has two publishers, it isn't the same book. It might have
the same name, but it'll have a different ISBN.)
Now you can join to writtenby, which is your books-to-authors relationship
table. Specifically, for a given book, it is a one-to-many
relationship.
Finally, you can join to authors, because from writtenby to authors, it's
a many-to-one relationship.
This should all make a lot more sense if you were to abandon
the old style FROM clause comma-delimited list of tables with WHERE conditions,
and started using JOIN syntax:
SELECT book.id
, book.title
, book.cost
, author.id
, author.name
, publisher.id
, publisher.name
FROM book
INNER
JOIN publisher
ON publisher.id = book.publid
INNER
JOIN writtenby
ON writtenby.bookid = book.id
INNER
JOIN author
ON author.id = writtenby.authorid
Doesn't that look better? The intent is certainly clearer, I think.
Now the only remaining task is to restrict the first table, the
books table, to only co-authored books. That requires a subquery:
WHERE book.id IN
( SELECT bookid
FROM writtenby
GROUP
BY bookid
HAVING COUNT(*) > 1 )
The subquery is uncorrelated, which means it can be executed
before the main query, to produce a list of book ids for only
co-authored books. Thus the main query is most efficient because
it retrieves the publisher and author data only for those books.
|