Problem solve Get help with specific problems with your technologies, process and projects.

# SQL puzzles and answers: Finding equal sets

## A SQL puzzle in the December 1993 issue of Database Programming & Design magazine was to use a supplier and parts table to find pairs of suppliers who provide exactly the same parts. This is the same thing as finding two equal sets. How many ways can you find to do this problem?

Set theory has two symbols for subsets. One is a "horseshoe" on its side, which means that set A is contained within set B and is sometimes called a proper subset. The other is the same symbol with a horizontal bar under it, which means "contained in or equal to," which is sometimes called just a subset or containment operator.

Standard SQL has never had an operator to compare tables against each other. Several college textbooks on relational databases mention a CONTAINS predicate that does not exist in standard SQL. Two such offenders are An Introduction to Data Base Systems by Bipin C. Desai (West Publishing, 1990, ISBN 0-314-66771-7) and Fundamentals of Database Systems by Elmasri and Navthe (Benjamin Cummings, 1989, ISBN 0-8053-0145-3). This predicate used to exist in the original System R, IBM's first experimental SQL system, but it was dropped from later SQL implementations because of the expense of running it.

The IN() predicate is a test for membership, not for subsets. For those of you who remember your high school set theory, membership is shown with a stylized epsilon with the containing set of the right side, thus ∈. Membership is for one element, while a subset is itself a set, not just an element.

Chris Date's puzzle in the December 1993 issue of Database Programming & Design magazine ("A Matter of Integrity, Part II" According to Date, December 1993) was to use a supplier and parts table to find pairs of suppliers who provide exactly the same parts. This is the same thing as finding two equal sets. Given his famous table:

```CREATE TABLE SupParts
(sno CHAR(2) NOT NULL,
pno CHAR(2) NOT NULL,
PRIMARY KEY (sno, pno));
```

How many ways can you find to do this problem?

One approach would be to do a FULL OUTER JOIN on each pair of suppliers. Any parts that are not common to both would show up, but would have generated NULLs in one of the columns derived from the supplier who was not in the INNER JOIN portion. This tells you which pairs are not matched, not who is. The final step is to remove these nonmatching pairs from all possible pairs.

```SELECT SP1.sno, SP2.sno
FROM SupParts AS SP1
INNER JOIN
SupParts AS SP2
ON SP1.pno = SP2.pno
AND SP1.sno < SP2.sno
EXCEPT
SELECT DISTINCT SP1.sno, SP2.sno
FROM SupParts AS SP1
FULL OUTER JOIN
SupParts AS SP2
ON SP1.pno = SP2.pno
AND SP1.sno < SP2.sno)
WHERE SP1.sno IS NULL
OR SP2.sno IS NULL;
```

This is probably going to run very slowly. The EXCEPT operator is the SQL equivalent of set difference.

The usual way of proving that two sets are equal to each other is to show that set A contains set B, and set B contains set A. What you would usually do in standard SQL would be to show that there exists no element in set A that is not in set B, and therefore A is a subset of B. So the first attempt is usually something like this:

```SELECT DISTINCT SP1.sno, SP2.sno
FROM SupParts AS SP1, SupParts AS SP2
WHERE SP1.sno < SP2.sno
AND SP1.pno IN (SELECT SP22.pno
FROM SupParts AS SP22
WHERE SP22.sno = SP2.sno)
AND SP2.pno IN (SELECT SP11.pno
FROM SupParts AS SP11
WHERE SP11.sno = SP1.sno));
```

Oops, this does not work because if a pair of suppliers has one item in common, they will be returned.

You can use the NOT EXISTS predicate to imply the traditional test mentioned in Answer #2.

```SELECT DISTINCT SP1.sno, SP2.sno
FROM SupParts AS SP1, SupParts AS SP2
WHERE SP1.sno < SP2.sno
AND NOT EXISTS (SELECT SP3.pno -- part in SP1 but not in
SP2
FROM SupParts AS SP3
WHERE SP1.sno = SP3.sno
AND SP3.pno
NOT IN (SELECT pno
FROM SupParts AS SP4
WHERE SP2.sno = SP4.sno))
AND NOT EXISTS (SELECT SP5.pno -- part in SP2 but not in
SP1
FROM SupParts AS SP5
WHERE SP2.sno = SP5.sno
AND SP5.pno
NOT IN (SELECT pno
FROM SupParts AS SP4
WHERE SP1.sno = SP4.sno));
```

Instead of using subsets, I thought I would look for another way to do set equality. First, I join one supplier to another on their common parts, eliminating the situation where supplier 1 is the same as supplier 2, so that I have the intersection of the two sets. If the intersection has the same number of pairs as each of the two sets has elements, then the two sets are equal.

```SELECT SP1.sno, SP2.sno
FROM SupParts AS SP1
INNER JOIN
SupParts AS SP2
ON SP1.pno = SP2.pno
AND SP1.sno < SP2.sno
GROUP BY SP1.sno, SP2.sno
HAVING (SELECT COUNT(*) -- one to one mapping EXISTS
FROM SupParts AS SP3
WHERE SP3.sno = SP1.sno)
= (SELECT COUNT(*)
FROM SupParts AS SP4
WHERE SP4.sno = SP2.sno);
```

If there is an index on the supplier number in the SupParts table, it can provide the counts directly as well as help with the join operation.

This is the same as Answer #4, but the GROUP BY has been replaced with a SELECT DISTINCT clause:

```SELECT DISTINCT SP1.sno, SP2.sno
FROM (SupParts AS SP1
INNER JOIN
SupParts AS SP2
ON SP1.pno = SP2.pno
AND SP1.sno < SP2.sno)
WHERE (SELECT COUNT(*)
FROM SupParts AS SP3
WHERE SP3.sno = SP1.sno)
= (SELECT COUNT(*)
FROM SupParts AS SP4
WHERE SP4.sno = SP2.sno);
```

This is a version of Answer #3, from Francisco Moreno, which has the NOT EXISTS predicate replaced by set difference. He was using Oracle, and its EXCEPT operator (called MINUS in their SQL dialect) is pretty fast.

```SELECT DISTINCT SP1.sno, SP2.sno
FROM SupParts AS SP1, SupParts AS SP2
WHERE SP1.sno < SP2.sno
AND NOT EXISTS (SELECT SP3.pno -- part in SP1 but not in
SP2
FROM SupParts AS SP3
WHERE SP1.sno = SP3.sno
EXCEPT
SELECT SP4.pno
FROM SupParts AS SP4
WHERE SP2.sno = SP4.sno
AND NOT EXISTS (SELECT SP5.pno -- part in SP2 but notin
SP1
FROM SupParts AS SP5
WHERE SP2.sno = SP5.sno
EXCEPT
SELECT SP6.pno
FROM SupParts AS SP6
WHERE SP1.sno = SP6.sno);
```

Alexander Kuznetsov once more has a submission that improves the old "counting matches in a join" approach:

```SELECT A.sno, B.sno AS sno1
FROM (SELECT sno, COUNT(*), MIN(pno), MAX(pno)
FROM SubParts GROUP BY sno)
AS A(cnt, min_pno, max_pno)
INNER JOIN
(SELECT sno, COUNT(*), MIN(pno), MAX(pno)
FROM SubParts GROUP BY sno)
AS B(cnt, min_pno, max_pno)
-- four conditions filter out most permutations
ON A.cnt = B.cnt
AND A.min_pno = B.min_pno
AND A.max_pno = B.max_pno
AND A.sno < B.sno
-- Expensive inner select below does not have to execute for
every pair
WHERE A.cnt
= (SELECT COUNT(*)
FROM SubParts AS A1,
SubParts AS B1
WHERE A1.pno = B1.pno
AND A1.sno = A.sno
AND B1.sno = B.sno);

sn sn
=======
ab bb
aq pq
```

The clever part of this query is that most optimizers can quickly find the MIN() and MAX() values on a column because they are stored in the statistics table.

Let's look at notation and some of the usual tests for equality:

```((A ⊆ B) = (B ⊆ A)) ⇒ (A = B)
((A ∪ B) = (B ∩ A)) ⇒ (A = B)
```

The first equation is really the basis for the comparisons that use joins. The second equation is done at the set level rather than the subset level, and it implies this answer:

```SELECT DISTINCT 'not equal'
FROM (SELECT * FROM A)
INTERSECT
SELECT * FROM B)
EXCEPT
(SELECT * FROM A)
UNION
SELECT * FROM B);
```

The idea is to return an empty set if tables A and B are equal. You have to be careful about using the ALL clauses on the set operators if you have duplicates. The good news is that these operators work with rows and not at the column level, so this template will generalize to any pairs of union-compatible tables. You do not have to know the column names.

Joe Celko is a noted consultant, lecturer, teacher and one of the most-read SQL authors in the world. He is well-known for his 10 years of service on the ANSI SQL standards committee, his dependable help on assorted SQL newsgroups, his column in Intelligent Enterprise (which won several Reader's Choice Awards) and the war stories he tells to provide real-world insights into SQL programming. His best-selling books include Joe Celko's SQL for Smarties: Advanced SQL Programming, Joe Celko's SQL Puzzles and Answers, Joe Celko's Trees and Hierarchies in SQL for Smarties and Joe Celko's SQL Style, all published by Morgan Kaufmann.

## SearchDataManagement

• ### Oracle brings GoldenGate data integration service to cloud

Oracle is making its GoldenGate real time data technology available on its second-generation Oracle Cloud Infrastructure platform...

• ### Era Software raises \$15.25M for enterprise data management

The startup that began as the EraDB time series database is advancing its efforts with new funding and a cloud service for its ...

• ### Dgraph GraphQL database users detail graph use cases

Graph DB vendor Dgraph Labs is expanding its AWS cloud footprint with new regions and adding change data capture capabilities in ...

• ### TigerGraph unveils support for GCP, adds new connectors

Graph database vendor TigerGraph unveiled support for Google Cloud and new connectors to Snowflake and Tableau on April 21 during...

• ### Startup Veezoo emerges from stealth with NLQ-based platform

Aiming to be 'Siri for enterprises,' an analytics startup emerged from stealth with a platform that enables users to interact ...

• ### 15 data science tools to consider using in 2021

Numerous tools are available for data science applications. Read about 15, including their features, capabilities and uses, to ...

## SearchSAP

• ### S/4HANA Cloud SaaS ERP: Buying team overview

SAP's multi-tenant SaaS ERP, S/4HANA Cloud, is a viable choice for companies that need ease in their infrastructure management. ...

• ### SAP forms financial services partnership with Dediq

SAP and financial industry investment firm Dediq are forming a new business unit to develop applications that help banks and ...

• ### Unpatched applications threaten SAP security

Cyberattacks are a significant threat to unpatched, unprotected SAP applications, according to a new threat intelligence report ...

## SearchSQLServer

• ### SQL Server database design best practices and tips for DBAs

Good database design is a must to meet processing needs in SQL Server systems. In a webinar, consultant Koen Verbeeck offered ...

• ### SQL Server in Azure database choices and what they offer users

SQL Server databases can be moved to the Azure cloud in several different ways. Here's what you'll get from each of the options ...

• ### Using a LEFT OUTER JOIN vs. RIGHT OUTER JOIN in SQL

In this book excerpt, you'll learn LEFT OUTER JOIN vs. RIGHT OUTER JOIN techniques and find various examples for creating SQL ...

## TheServerSide.com

• ### Incorporate diversity and inclusion in technology design

DEI in technology is about more than creating a diverse workplace. We talked to a few DEI professionals about how teams build ...

• ### Microsoft previews OpenJDK distro to the delight of devs

In a move meant to attract more Java developers to its Azure cloud and further support the Java community, Microsoft launched a ...

• ### Supreme Court ruling on Java APIs eases developer worries

Now that the Supreme Court has ruled for Google over Oracle in their high-stakes copyright battle over Java APIs, developers can ...

## SearchDataCenter

• ### Nvidia SDK simulates quantum computing circuits on GPU systems

Nvidia edged its way into the quantum computing market with an SDK that simulates quantum circuits by adding horsepower to ...

• ### Programmable processor technology for next-gen data centers

The right processing technology can benefit your data center. Learn about advancements in CPU technologies, recent vendor ...

• ### Data processing units accelerate infrastructure performance

DPUs often run on networking packets to move information in the data center, instead of supporting processing workflows. Get an ...

## SearchContentManagement

• ### Hyland gets digital asset management tech with Nuxeo buy

By acquiring a smaller competitor's digital asset management platform, Hyland looks to build on its 2020 purchase of Alfresco, ...

• ### OpenText releases Cloud Editions content services updates

OpenText CE 21.2 includes federated document compliance that extends to Microsoft Office 365, along with a revamped content ...

As the pandemic disrupts paper workflows, Adobe courts small business users with simple webforms, digital signatures and payments...

## SearchHRSoftware

• ### Shift to HR shared services could save Connecticut millions

By consolidating 17 separate HR operations into one shared service, Connecticut expects to save significantly on costs, most of ...

• ### 10 steps to support your HCM system post go-live

To ensure a new system's success, HR leaders need to develop a plan for how they will support their new HCM system post go-live. ...

• ### Face mask detection a newcomer to employee surveillance

Face mask detection has emerged as another form of employee surveillance technology, and its adoption may be helped indirectly by...

Close