Q
Problem solve Get help with specific problems with your technologies, process and projects.

# Customers who bought X at least once and Y at least twice

I have a table CUSTOMERS (customer_id) and a table PURCHASES (customer_id, purchase_date, product_id). I am trying to find the distinct customers that have bought at least once product_id=X and, in the 12 previous months, bought at least twice product_id=Y. Any idea that could do the job in a minimum time?

Solutions that do the job "in a minimum time" are always challenging. Success depends on the existence of proper indexes. Assuming these are in place, we can still sometimes see markedly different results for solutions written with different query constructions.

The first step towards a solution, the most important step, is to make sure we understand the exact requirements. In this case, we don't care about purchase data, just that it exists. We aren't actually retrieving anything from the PURCHASES table! If you had said "give the date of the latest Product X purchase, and the total number of Product Y purchases" then we'd need to write a totally different query.

Here's one solution:

```select customer_id
from CUSTOMERS as C
where exists
( select *
from PURCHASES
where customer_id
= C.customer_id
and product_id = 'X' )

and 2 <=
( select count(*)
from PURCHASES
where customer_id
= C.customer_id
and product_id = 'Y'
and purchase_date
between date1
and date2 )```

Each of the two subqueries above is a correlated subquery. This means that it considers only those purchases which match the customer_id of the correlated row in the main query.

One advantage of using correlated subqueries is that it's fairly easy to understand what they're doing. They "read" well. In this case, though, there are two of them, which leaves open the possibility that the database optimizer will generate two separate joins in order to execute them. (Correlated subqueries are usually executed as joins.)

Here's a different solution:

```select C.customer_id
from CUSTOMERS as C
inner
join PURCHASES as P
on C.customer_id
= P.customer_id
group
by C.customer_id
having 0 <
sum(
case when P.product_id = 'X'
then 1 else 0 end
)
and 2 <=
sum(
case when P.product_id = 'Y'
and P.purchase_date
between date1
and date2
then 1 else 0 end
)```

Here you can see we've taken matters into our own hands and performed one join explicitly. Note that we're still just selecting from the CUSTOMERS table. The WHERE EXISTS construction is replaced by taking a count and making sure it's not zero. The counts are achieved by obtaining the SUM of a column of 1's and 0's.

Which of the solutions is faster? Try them both, and see.

#### Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

#### Start the conversation

Send me notifications when other members comment.

## SearchDataManagement

• ### Hitachi Vantara acquires data catalog vendor Waterline Data

With the acquisition of Waterline Data, Hitachi Vantara is bringing new data catalog capabilities that will expand the Lumada ...

• ### New Confluent Platform release boosts event streaming quality

Based on the open-source Kafka event streaming platform, the Confluent Platform 5.4 update adds new capabilities to help meet ...

• ### Where InfluxDB time series database is going

Users need more than SQL for querying databases, according to Paul Dix, co-founder and CTO of InfluxData. That's why the vendor ...

• ### ThoughtSpot IPO could be coming after vendor adds first CFO

Hiring of a CFO for the first time signals that ThoughtSpot may be positioning itself for an IPO and comes six months after what ...

• ### 5 ways enterprises adapt to the data scientist shortage

Where are all the data scientists? Coping with the data scientist shortage is a struggle for many enterprises. Here are five ways...

• ### Storytelling using data makes information easy to digest

In a Q&A, Nate Nichols and Anna Schena Walsh of AI-based analytics vendor Narrative Science talk about how data storytelling can ...

## SearchSAP

• ### SAP S/4HANA migration: Critical advice for moving off ECC

With the end of SAP ECC support looming in 2025, organizations must make some tough decisions. Here's a look at your choices.

• ### New SAP leadership faces big challenges in 2020

Industry analysts discuss SAP's biggest issues in 2020, including how the two new CEOs will guide the company deeper into the ...

• ### SAP Data Hub opens predictive possibilities at Paul Hartmann

When medical supply firm Paul Hartmann AG tested a supply chain analysis system built on SAP Data Hub, it found that it could ...

## SearchSQLServer

• ### SQL Server database design best practices and tips for DBAs

Good database design is a must to meet processing needs in SQL Server systems. In a webinar, consultant Koen Verbeeck offered ...

• ### SQL Server in Azure database choices and what they offer users

SQL Server databases can be moved to the Azure cloud in several different ways. Here's what you'll get from each of the options ...

• ### Using a LEFT OUTER JOIN vs. RIGHT OUTER JOIN in SQL

In this book excerpt, you'll learn LEFT OUTER JOIN vs. RIGHT OUTER JOIN techniques and find various examples for creating SQL ...

## TheServerSide.com

• ### Don't ever put a non-Java LTS release into production

Development teams should avoid non-long-term support releases at all costs. Pay attention to the Java release cycle to make sure ...

• ### Public API strategy considerations for enterprise adoption

As organizations look for more cost-effective ways to manage data, an evolving landscape with APIs has made the technology more ...

• ### Ideas on how to hold a successful code hackathon

Want to host a hackathon? Here are some ideas on what a company can do to host an event that solves problems and reenergizes the ...

## SearchDataCenter

• ### Top data center skills admins can use in 2020

The 2019 tech job sector saw consistent growth and job availability. In 2020, admins should develop expertise on cloud ...

• ### Organizations try to predict the effect of 5G infrastructure

With more 5G and IoT devices emerging, admins must prepare their data centers to support low-latency apps and edge computing with...

• ### Top infrastructure and operations technology myths of 2019

Admins are consistently evaluating technology to improve I&O efficiency. Cost, integration and business goals are key components ...

## SearchContentManagement

• ### 4 popular content collaboration platforms to consider

Companies need to be organized if they want to be efficient. Content collaboration platforms are useful, but first, ensure that ...

• ### AI can enhance content security with a bit of planning

Microsoft and Box both use AI technologies to keep content secure in the cloud. But before using such tools, businesses first ...

• ### Ex-SAP exec steers Episerver CMS toward digital experience market

Alex Atzberger discusses leaving the helm of SAP's CX platform to become Episerver CEO. Now, Episerver looks to reinvent itself ...

## SearchHRSoftware

• ### Critical tips for managing contingent workers

Contingent workers save companies both time and money, so it's important to manage them in a win-win way. Here is what HR teams ...

• ### Impact of AI on jobs goes on the presidential campaign trail

The impact of AI on jobs is a major issue for employers, who are struggling with how to address it. Robots, automation and AI ...

• ### Why mobile recruiting is the future

Recruiters can use text recruiting to connect with great candidates. Here's a look at how mobile recruiting works, why it's ...

Close