I have the following SQL statement:
SELECT * FROM myTable WHERE pkColumn in (pkValue1, pkValue2, pkValue3, pkValue4, pkValue5, ...)
I now need to substitute pkColumn in that statement with a composite primary key of, let's say, two integer columns.
SELECT * FROM myTable WHERE (pkColumn1 = pk1Value1 AND pkColumn2 = pk2Value1) OR (pkColumn1 = pk1Value2 AND pkColumn2 = pk2Value2) OR ...
The problem, besides performance issues, is that the statement gets very long very fast and statements are limited.
Is there a better solution that does not make use of a temporary table?
This is another of those sneaky questions which seem to be designed to test my knowledge of performance. Many years ago, I realized that performance is a DBA (database administrator) concern, because there are so many server parameters that can be tweaked and tuned to improve it. And since I'm not a DBA, this implied that I needn't be too concerned with performance, beyond the minimum of "good SQL hygiene" (indexing your join and search columns, writing sargable predicates, and so on).
Nevertheless, your question is a good one, and perhaps my very limited DBA knowledge will help you set your direction.
If you can pass a set of criteria into the optimizer using "hardcoded" values in a query, then that query is going to perform as fast as possible. My reason for believing this to be true is that once the WHERE clause has been parsed, all those pkColumnN=pk1ValueN expressions, together with their ANDs and ORs, will be compiled into memory. Not being a DBA, I can only imagine this process, which probably involves some sort of slick bytecode stack somewhere that is used to evaluate each row. But the key point here is that it will be in memory, which is the most efficient data store of all.
You are right that there will be a limit to the size of the SQL statement that you can construct this way, but this limit will be reasonably large, depending on which database system you're using. In MySQL, for example, you can change this limit via the MAX_ALLOWED_PACKET variable, but its initial default value is 1 megabyte, which is a pretty large SQL statement.
The only other alternative to large cumbersome SQL statements (besides a temp table) would be to see if you can simplify the SQL statement. For example, maybe you could rewrite your query like this:
SELECT * FROM myTable WHERE (pkColumn1, pkColumn2) IN ( (pk1Value1, pk2Value1) , (pk1Value2, pk2Value2) , ... )
Same results, but the query string is shorter.
This was first published in April 2006