The difference between NOT IN and NOT EXISTS
What's the difference between NOT IN and NOT EXISTS in SQL? Which is better to use in SQL queries for the performance aspects?
What's the difference between NOT IN and NOT EXISTS in SQL? Which is better to use in SQL queries for the performance...
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
aspects?
For the first question, let's assume we're talking about the difference in syntax between NOT IN and NOT EXISTS when both have subqueries.
Here are examples of the syntax:
select client_name from clients where client_id not in ( select client_id from clientwarnings )
select client_name from clients as x where not exists ( select client_id from clientwarnings where client_id = x.client_id )
In the NOT IN example, the subquery identifies a set of client_ids, against which each client_id in the clients table can be compared. In the NOT EXISTS example, the subquery is correlated and can be thought of as doing a lookup for each client_id in the clients table.
Your second question asks which is preferred for performance reasons. My answer is: it's your call. Don't trust your instinct, because it might be wrong.
My guess would be that the NOT IN query has to build a list for the subquery, which is slow, but then it can do in-memory compares, which is fast, while the NOT EXISTS query can use matching indexes, which are faster than building a memory table ... As you can see, I am easily confused.
And what about the LEFT OUTER JOIN testing for IS NULL? A perfectly good alternate solution:
select c.client_name from clients as c left outer join clientwarnings as cw on c.client_id = cw.client_id where cw.client_id is null
Bottom line: use EXPLAIN on your queries to look at the execution plans and optimizer costs for each query, and base your decision about performance on an understanding of what your optimizer is telling you about your data.
All things being equal—and sometimes three different syntaxes such as the above can produce the same execution plan—you might want to consider which of the above three queries is the easiest to maintain, based on how quickly a maintenance programmer can recognize what the query is supposed to be doing.
The answer is: none of them! They aren't documented! Let's add the following line of code at the beginning of each of the three queries above:
/* returns names of clients who have no warnings */
Now each of them makes more sense, right?