Home > Oracle Tips > Database Developer > SQL puzzles and answers: Finding equal sets
Oracle Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATABASE DEVELOPER

SQL puzzles and answers: Finding equal sets


Joe Celko
10.17.2006
Rating: -4.80- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


This is an excerpt from Joe Celko's SQL Puzzles and Answers, Second Edition by Joe Celko. Printed with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2007. For more information about this book, and other similar titles, please visit www.mkp.com.

Set theory has two symbols for subsets. One is a "horseshoe" on its side, which means that set A is contained within set B and is sometimes called a proper subset. The other is the same symbol with a horizontal bar under it, which means "contained in or equal to," which is sometimes called just a subset or containment operator.

Standard SQL has never had an operator to compare tables against each other. Several college textbooks on relational databases mention a CONTAINS predicate that does not exist in standard SQL. Two such offenders are An Introduction to Data Base Systems by Bipin C. Desai (West Publishing, 1990, ISBN 0-314-66771-7) and Fundamentals of Database Systems by Elmasri and Navthe (Benjamin Cummings, 1989, ISBN 0-8053-0145-3). This predicate used to exist in the original System R, IBM's first experimental SQL system, but it was dropped from later SQL implementations because of the expense of running it.

The IN() predicate is a test for membership, not for subsets. For those of you who remember your high school set theory, membership is shown with a stylized epsilon with the containing set of the right side, thus ∈. Membership is for one element, while a subset is itself a set, not just an element.

Chris Date's puzzle in the December 1993 issue of Database Programming & Design magazine ("A Matter of Integrity, Part II" According to Date, December 1993) was to use a supplier and parts table to find pairs of suppliers who provide exactly the same parts. This is the same thing as finding two equal sets. Given his famous table:

CREATE TABLE SupParts
(sno CHAR(2) NOT NULL,
pno CHAR(2) NOT NULL,
PRIMARY KEY (sno, pno));

How many ways can you find to do this problem?

Answer #1

One approach would be to do a FULL OUTER JOIN on each pair of suppliers. Any parts that are not common to both would show up, but would have generated NULLs in one of the columns derived from the supplier who was not in the INNER JOIN portion. This tells you which pairs are not matched, not who is. The final step is to remove these nonmatching pairs from all possible pairs.

SELECT SP1.sno, SP2.sno
   FROM SupParts AS SP1
       INNER JOIN
       SupParts AS SP2
       ON SP1.pno = SP2.pno
              AND SP1.sno < SP2.sno
EXCEPT
SELECT DISTINCT SP1.sno, SP2.sno
    FROM SupParts AS SP1
         FULL OUTER JOIN
         SupParts AS SP2
            ON SP1.pno = SP2.pno
                 AND SP1.sno < SP2.sno)
WHERE SP1.sno IS NULL
      OR SP2.sno IS NULL;

This is probably going to run very slowly. The EXCEPT operator is the SQL equivalent of set difference.

Answer #2

The usual way of proving that two sets are equal to each other is to show that set A contains set B, and set B contains set A. What you would usually do in standard SQL would be to show that there exists no element in set A that is not in set B, and therefore A is a subset of B. So the first attempt is usually something like this:

SELECT DISTINCT SP1.sno, SP2.sno
  FROM SupParts AS SP1, SupParts AS SP2
  WHERE SP1.sno < SP2.sno
     AND SP1.pno IN (SELECT SP22.pno
              FROM SupParts AS SP22
            WHERE SP22.sno = SP2.sno)
AND SP2.pno IN (SELECT SP11.pno
              FROM SupParts AS SP11
            WHERE SP11.sno = SP1.sno));

Oops, this does not work because if a pair of suppliers has one item in common, they will be returned.

Answer #3

You can use the NOT EXISTS predicate to imply the traditional test mentioned in Answer #2.

SELECT DISTINCT SP1.sno, SP2.sno
  FROM SupParts AS SP1, SupParts AS SP2
  WHERE SP1.sno < SP2.sno
     AND NOT EXISTS (SELECT SP3.pno -- part in SP1 but not in
SP2
               FROM SupParts AS SP3
             WHERE SP1.sno = SP3.sno
                  AND SP3.pno
                            NOT IN (SELECT pno
                                FROM SupParts AS SP4
                               WHERE SP2.sno = SP4.sno))
     AND NOT EXISTS (SELECT SP5.pno -- part in SP2 but not in
SP1
                FROM SupParts AS SP5
             WHERE SP2.sno = SP5.sno
                 AND SP5.pno
                             NOT IN (SELECT pno
                                FROM SupParts AS SP4
                              WHERE SP1.sno = SP4.sno));

Answer #4

Instead of using subsets, I thought I would look for another way to do set equality. First, I join one supplier to another on their common parts, eliminating the situation where supplier 1 is the same as supplier 2, so that I have the intersection of the two sets. If the intersection has the same number of pairs as each of the two sets has elements, then the two sets are equal.

SELECT SP1.sno, SP2.sno
  FROM SupParts AS SP1
      INNER JOIN
      SupParts AS SP2
      ON SP1.pno = SP2.pno
      AND SP1.sno < SP2.sno
  GROUP BY SP1.sno, SP2.sno
HAVING (SELECT COUNT(*) -- one to one mapping EXISTS
           FROM SupParts AS SP3
         WHERE SP3.sno = SP1.sno)
 = (SELECT COUNT(*)
           FROM SupParts AS SP4
         WHERE SP4.sno = SP2.sno);

If there is an index on the supplier number in the SupParts table, it can provide the counts directly as well as help with the join operation.

Answer #5

This is the same as Answer #4, but the GROUP BY has been replaced with a SELECT DISTINCT clause:

SELECT DISTINCT SP1.sno, SP2.sno
    FROM (SupParts AS SP1
          INNER JOIN
          SupParts AS SP2
          ON SP1.pno = SP2.pno
                AND SP1.sno < SP2.sno)
WHERE (SELECT COUNT(*)
              FROM SupParts AS SP3
             WHERE SP3.sno = SP1.sno)
= (SELECT COUNT(*)
              FROM SupParts AS SP4
            WHERE SP4.sno = SP2.sno);

Answer #6

This is a version of Answer #3, from Francisco Moreno, which has the NOT EXISTS predicate replaced by set difference. He was using Oracle, and its EXCEPT operator (called MINUS in their SQL dialect) is pretty fast.

SELECT DISTINCT SP1.sno, SP2.sno
   FROM SupParts AS SP1, SupParts AS SP2
   WHERE SP1.sno < SP2.sno
      AND NOT EXISTS (SELECT SP3.pno -- part in SP1 but not in
SP2
                           FROM SupParts AS SP3
                           WHERE SP1.sno = SP3.sno
                           EXCEPT
                           SELECT SP4.pno
                             FROM SupParts AS SP4
                           WHERE SP2.sno = SP4.sno
   AND NOT EXISTS (SELECT SP5.pno -- part in SP2 but notin
SP1
                             FROM SupParts AS SP5
                           WHERE SP2.sno = SP5.sno
                           EXCEPT
                           SELECT SP6.pno
                             FROM SupParts AS SP6
                           WHERE SP1.sno = SP6.sno);

Answer #7

Alexander Kuznetsov once more has a submission that improves the old "counting matches in a join" approach:

SELECT A.sno, B.sno AS sno1
   FROM (SELECT sno, COUNT(*), MIN(pno), MAX(pno)
           FROM SubParts GROUP BY sno)
        AS A(cnt, min_pno, max_pno)
        INNER JOIN
        (SELECT sno, COUNT(*), MIN(pno), MAX(pno)
            FROM SubParts GROUP BY sno)
        AS B(cnt, min_pno, max_pno)
-- four conditions filter out most permutations
        ON A.cnt = B.cnt
              AND A.min_pno = B.min_pno
              AND A.max_pno = B.max_pno
              AND A.sno < B.sno
-- Expensive inner select below does not have to execute for
every pair
  WHERE A.cnt
           = (SELECT COUNT(*)
                   FROM SubParts AS A1,
                               SubParts AS B1
               WHERE A1.pno = B1.pno
                    AND A1.sno = A.sno
                    AND B1.sno = B.sno);

sn sn
=======
ab bb
aq pq

The clever part of this query is that most optimizers can quickly find the MIN() and MAX() values on a column because they are stored in the statistics table.

Answer #8

Let's look at notation and some of the usual tests for equality:

((A ⊆ B) = (B ⊆ A)) ⇒ (A = B)
((A ∪ B) = (B ∩ A)) ⇒ (A = B)

The first equation is really the basis for the comparisons that use joins. The second equation is done at the set level rather than the subset level, and it implies this answer:

SELECT DISTINCT 'not equal'
   FROM (SELECT * FROM A)
        INTERSECT
        SELECT * FROM B)
        EXCEPT
      (SELECT * FROM A)
         UNION
         SELECT * FROM B);

The idea is to return an empty set if tables A and B are equal. You have to be careful about using the ALL clauses on the set operators if you have duplicates. The good news is that these operators work with rows and not at the column level, so this template will generalize to any pairs of union-compatible tables. You do not have to know the column names.

About the author

Joe Celko is a noted consultant, lecturer, teacher and one of the most-read SQL authors in the world. He is well-known for his 10 years of service on the ANSI SQL standards committee, his dependable help on assorted SQL newsgroups, his column in Intelligent Enterprise (which won several Reader's Choice Awards) and the war stories he tells to provide real-world insights into SQL programming. His best-selling books include Joe Celko's SQL for Smarties: Advanced SQL Programming, Joe Celko's SQL Puzzles and Answers, Joe Celko's Trees and Hierarchies in SQL for Smarties and Joe Celko's SQL Style, all published by Morgan Kaufmann.

Rate this Tip
To rate tips, you must be a member of SearchOracle.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Database Developer
How do you create a link between two databases inside a stored procedure?
PL/SQL do's and don't's: Five questions with Steven Feuerstein
Mike Ault's Oracle "good practices": Oracle coding
Easy Oracle PL/SQL programming: Assignments, initializations and NULLs
Introduction to BPEL
Developer and DBA: Working together for greater efficiency
NULLs in WHERE clauses can be deceptive
The Top 10 (more or less) J2EE best practices
Build a servlet-based application that executes SQL statements against a database
Joining tables in SQL queries

Oracle SQL
SQL to count values of a status code
Counting NULL columns
Detail rows for accounts that occur three times
Counting a row's NULL columns
Oracle's free SQL Developer adds database migration tool
Latest transaction if no recent prior transactions
Three ways SQL can count rows by type
SQL to select only certain times within a date range
Oracle SQL to test for numerics
Number of rows in multiple tables
Oracle SQL Research

Oracle database design
Weighing remote database administration pros and cons takes care
Oracle Database 11g makes waves at Burlington Coat Factory
How to create a database link in Oracle
Data modeling tools no substitute for hard work
How do I do that in Oracle?
The Oracle Database user's guide to Oracle OpenWorld 2007
Oracle OpenWorld 2007 Special Report
How many redo log files?
How to move tables from system tablespace to user tablespace
ORA-12560 error with Oracle 10g Instant Client

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
autonomous transaction  (SearchOracle.com)
CFML  (SearchOracle.com)
dynamic SQL  (SearchOracle.com)
foreign key  (SearchOracle.com)
Java Database Connectivity  (SearchOracle.com)
Open Database Connectivity  (SearchOracle.com)
Oracle  (SearchOracle.com)
stored procedure  (SearchOracle.com)
The Open Group  (SearchOracle.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

HomeNewsTopicsTipsAsk the ExpertsMultimediaWhite PapersProductsBlogs
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2003 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts