Ask the Expert

Query 2 should not include results from Query 1

We have a Users table with over 3 million user records in it (like a data warehouse). The requirement is to execute two queries on the Users table, where the return of the second query should not have any users that were returned in the first query.

For example, consider a sweepstake user list, where you want to send flyers to people, notifying them about prizes, but do not want to send two flyers to the same person, even if he fulfills the selection criteria for both sweepstakes.

So --

Query 1 has conditions which state: All Users from New York, with age=25 (independently run, returns 500,000 users)

Query 2 has conditions, which state: All Users of age=25 (independently run, returns 700,000 users)

Expected Output:

Query 1 should return 500,000

Query 2 should return 700,000 MINUS those users that were already returned in Query 1, i.e., only users of age 25, but who are not from New York

Any inputs will be much appreciated!

    Requires Free Membership to View

Fascinating question. Let's assume we need separate mailout lists, one for each query, and let's assume we need separate result sets (different columns) and separate filter conditions for each query. There are three ways to write the queries. Here's one way, using NOT EXISTS:

Query 1
select query-1-columns
  from Users
 where query-1-conditions
Query 2
select query-2-columns
  from Users as X
 where query-2-conditions
   and not exists
       ( select userid
           from Users
          where userid = X.userid 
            and query-1-conditions )

In addition to NOT EXISTS, you can also use NOT IN or LEFT OUTER JOIN syntax. See my recent answer The difference between NOT IN and NOT EXISTS.

However, if we need just one result set, if we were doing just one mailout but still had different conditions for different sets of users, just with no overlap, then another look is warranted.

In the given example, the specific conditions were inclusive. In other words, if we need only one result set, we can run just one query with age=25 and get everybody, New York or not. It's always nice to eliminate a step that isn't necessary.

If the conditions aren't inclusive, but it's still okay to return only one result set (same columns), just not the same person twice, then a simple UNION will suffice:

select query-columns
  from Users
 where query-1-conditions
union
select query-columns
  from Users 
 where query-2-conditions

UNION removes duplicates.

An equivalent query is:

select query-columns
  from Users
 where ( query-1-conditions )
    or ( query-2-conditions )

These two queries will not necessarily perform differently, but they might.

This was first published in September 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: