I have a parent and child tables. The parent has seqno as the primary key and information as to who made a call and where and which form they used, etc. The child table has seqno from the parent, the question numbers on the form, and finally the answer to the question. A call is a visit to a store using a specific form which has a variable number of questions depending on the form number. I am looking for a way to find duplicate calls, done by the same person using the same form and the answers being the same. If a single answer is different the row does not qualify as duplicate. Is there a set based solution for this?
From your description, I shall assume the tables look like this:
Calls seqno personid formid 301 12 56 302 16 75 303 12 56
Answers pseqno question answer 301 1 'y' 301 2 'n' 301 3 'y' 302 1 'n' 302 2 'n' 302 3 'n' 303 1 'y' 303 2 'n' 303 3 'n'
Person 12 filled in form 56 twice, but the answers were different the second time, so you don't want to qualify call 303 as a duplicate of 301.
I would start by joining the combined data from the Answers and Calls tables to itself, based on the question number and also on the parent personid and formid, and with the added condition that the parent seqno values be different, because you don't want to compare a call to itself --
select something from ( Answers inner join Calls on cseqno = seqno ) X inner join ( Answers inner join Calls on cseqno = seqno ) Y on X.personid = Y.personid and X.formid = Y.formid and X.question = Y.question and X.seqno > Y.seqno
Notice that X.seqno is greater than Y.seqno -- if we had used "not equals" then since it's a self-join, each pair would come in twice. By selecting only X.seqno greater than Y.seqno, the later call will be considered the duplicate.
But what do we select? The answer is to count, for each call, the number of answers that don't match.
select X.seqno, ' is duplicated by ', Y.seqno , sum( case when X.answer = Y.answer then 0 else 1 end ) mismatches from ... group by X.seqno, Y.seqno having sum( case when X.answer = Y.answer then 0 else 1 end ) > 0
If the answers are optional, such that one call might have an extra answer than another call for the same person for the same form, then you'll have to use an outer join and COALESCE in the CASE, and probably have to run both inequalities, X.seqno greater than Y.seqno and X.seqno less than Y.seqno. I didn't test that scenario.
For More Information
- Dozens more answers to tough SQL questions from Rudy Limeback.
- The Best SQL Web Links: tips, tutorials, scripts, and more.
- Have an SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
- Ask your technical SQL questions -- or help out your peers by answering them -- in our live discussion forums.
- Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.