We have a Users table with over 3 million user records in it (like a data warehouse). The requirement is to execute two queries on the Users table, where the return of the second query should not have any users that were returned in the first query.
For example, consider a sweepstake user list, where you want to send flyers to people, notifying them about prizes, but do not want to send two flyers to the same person, even if he fulfills the selection criteria for both sweepstakes.
Query 1 has conditions which state: All Users from New York, with age=25 (independently run, returns 500,000 users)
Query 2 has conditions, which state: All Users of age=25 (independently run, returns 700,000 users)
Query 1 should return 500,000
Query 2 should return 700,000 MINUS those users that were already returned in Query 1, i.e., only users of age 25, but who are not from New York
Any inputs will be much appreciated!
Fascinating question. Let's assume we need separate mailout lists, one for each query, and let's assume we need separate result sets (different columns) and separate filter conditions for each query. There are three ways to write the queries. Here's one way, using NOT EXISTS:
Query 1 select query-1-columns from Users where query-1-conditionsQuery 2 select query-2-columns from Users as X where query-2-conditions and not exists ( select userid from Users where userid = X.userid and query-1-conditions )
In addition to NOT EXISTS, you can also use NOT IN or LEFT OUTER JOIN syntax. See my recent answer The difference between NOT IN and NOT EXISTS.
However, if we need just one result set, if we were doing just one mailout but still had different conditions for different sets of users, just with no overlap, then another look is warranted.
In the given example, the specific conditions were inclusive. In other words, if we need only one result set, we can run just one query with age=25 and get everybody, New York or not. It's always nice to eliminate a step that isn't necessary.
If the conditions aren't inclusive, but it's still okay to return only one result set (same columns), just not the same person twice, then a simple UNION will suffice:
select query-columns from Users where query-1-conditions union select query-columns from Users where query-2-conditions
UNION removes duplicates.
An equivalent query is:
select query-columns from Users where ( query-1-conditions ) or ( query-2-conditions )
These two queries will not necessarily perform differently, but they might.
This was first published in September 2005