This is an excerpt from Joe Celko's SQL Puzzles and Answers, Second Edition by Joe Celko. Printed with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2007. For more information about this book, and other similar titles, please visit www.mkp.com.
Set theory has two symbols for subsets. One is a "horseshoe" on its side, which means that set A is contained within set B and is sometimes
called a proper subset. The other is the same symbol with a horizontal
bar under it, which means "contained in or equal to," which is
sometimes called just a subset or containment operator.
Standard SQL has never had an operator to compare tables against
each other. Several college textbooks on relational databases mention a
CONTAINS predicate that does not exist in standard SQL. Two such
offenders are An Introduction to Data Base Systems by Bipin C. Desai
(West Publishing, 1990, ISBN 0-314-66771-7) and Fundamentals of
Database Systems by Elmasri and Navthe (Benjamin Cummings, 1989,
ISBN 0-8053-0145-3). This predicate used to exist in the original System
R, IBM's first experimental SQL system, but it was dropped from later
SQL implementations because of the expense of running it.
The IN() predicate is a test for membership, not for subsets. For
those of you who remember your high school set theory, membership is
shown with a stylized epsilon with the containing set of the right side,
thus ∈. Membership is for one element, while a subset is itself a set, not
just an element.
Chris Date's puzzle in the December 1993 issue of Database
Programming & Design magazine ("A Matter of Integrity, Part II"
According to Date, December 1993) was to use a supplier and parts table
to find pairs of suppliers who provide exactly the same parts. This is the
same thing as finding two equal sets. Given his famous table:
How many ways can you find to do this problem?
Answer #1
One approach would be to do a FULL OUTER ...
To continue reading for free, register below or login
To read more you must become a member of SearchOracle.com
');
// -->

JOIN on each pair of
suppliers. Any parts that are not common to both would show up, but
would have generated NULLs in one of the columns derived from the
supplier who was not in the INNER JOIN portion. This tells you which
pairs are not matched, not who is. The final step is to remove these
nonmatching pairs from all possible pairs.
This is probably going to run very slowly. The EXCEPT operator is the
SQL equivalent of set difference.
Answer #2
The usual way of proving that two sets are equal to each other is to show
that set A contains set B, and set B contains set A. What you would
usually do in standard SQL would be to show that there exists no
element in set A that is not in set B, and therefore A is a subset of B. So
the first attempt is usually something like this:
Oops, this does not work because if a pair of suppliers has one item
in common, they will be returned.
Answer #3
You can use the NOT EXISTS predicate to imply the traditional test
mentioned in Answer #2.
Answer #4
Instead of using subsets, I thought I would look for another way to do set
equality. First, I join one supplier to another on their common parts,
eliminating the situation where supplier 1 is the same as supplier 2, so
that I have the intersection of the two sets. If the intersection has the
same number of pairs as each of the two sets has elements, then the two
sets are equal.
If there is an index on the supplier number in the SupParts table, it
can provide the counts directly as well as help with the join operation.
Answer #5
This is the same as Answer #4, but the GROUP BY has been replaced with
a SELECT DISTINCT clause:
Answer #6
This is a version of Answer #3, from Francisco Moreno, which has the NOT
EXISTS predicate replaced by set difference. He was using Oracle, and its
EXCEPT operator (called MINUS in their SQL dialect) is pretty fast.
Answer #7
Alexander Kuznetsov once more has a submission that improves the old
"counting matches in a join" approach:
The clever part of this query is that most optimizers can quickly find
the MIN() and MAX() values on a column because they are stored in the
statistics table.
Answer #8
Let's look at notation and some of the usual tests for equality:
The first equation is really the basis for the comparisons that use
joins. The second equation is done at the set level rather than the subset
level, and it implies this answer:
The idea is to return an empty set if tables A and B are equal. You
have to be careful about using the ALL clauses on the set operators if
you have duplicates. The good news is that these operators work with
rows and not at the column level, so this template will generalize to
any pairs of union-compatible tables. You do not have to know the
column names.
About the author
Joe Celko is a noted consultant, lecturer, teacher and one of the most-read SQL authors in the world. He is well-known for his 10 years of service on the ANSI SQL standards committee, his dependable help on assorted SQL newsgroups, his column in Intelligent Enterprise (which won several Reader's Choice Awards) and the war stories he tells to provide real-world insights into SQL programming. His best-selling books include Joe Celko's SQL for Smarties: Advanced SQL Programming, Joe Celko's SQL Puzzles and Answers, Joe Celko's Trees and Hierarchies in SQL for Smarties and Joe Celko's SQL Style, all published by Morgan Kaufmann.