Problem solve Get help with specific problems with your technologies, process and projects.

The difference between NOT IN and NOT EXISTS

What's the difference between NOT IN and NOT EXISTS in SQL? Which is better to use in SQL queries for the performance aspects?

What's the difference between NOT IN and NOT EXISTS in SQL? Which is better to use in SQL queries for the performance...


For the first question, let's assume we're talking about the difference in syntax between NOT IN and NOT EXISTS when both have subqueries.

Here are examples of the syntax:

select client_name
  from clients
 where client_id not in
       ( select client_id
           from clientwarnings )
select client_name
  from clients as x
 where not exists
       ( select client_id
           from clientwarnings
          where client_id = x.client_id )

In the NOT IN example, the subquery identifies a set of client_ids, against which each client_id in the clients table can be compared. In the NOT EXISTS example, the subquery is correlated and can be thought of as doing a lookup for each client_id in the clients table.

Your second question asks which is preferred for performance reasons. My answer is: it's your call. Don't trust your instinct, because it might be wrong.

My guess would be that the NOT IN query has to build a list for the subquery, which is slow, but then it can do in-memory compares, which is fast, while the NOT EXISTS query can use matching indexes, which are faster than building a memory table ... As you can see, I am easily confused.

And what about the LEFT OUTER JOIN testing for IS NULL? A perfectly good alternate solution:

select c.client_name
  from clients as c
left outer
  join clientwarnings as cw
    on c.client_id = cw.client_id
 where cw.client_id is null 

Bottom line: use EXPLAIN on your queries to look at the execution plans and optimizer costs for each query, and base your decision about performance on an understanding of what your optimizer is telling you about your data.

All things being equal—and sometimes three different syntaxes such as the above can produce the same execution plan—you might want to consider which of the above three queries is the easiest to maintain, based on how quickly a maintenance programmer can recognize what the query is supposed to be doing.

The answer is: none of them! They aren't documented! Let's add the following line of code at the beginning of each of the three queries above:

/* returns names of clients who have no warnings */

Now each of them makes more sense, right?

Dig Deeper on Oracle and SQL