Problem solve Get help with specific problems with your technologies, process and projects.

Find duplicate sets of child rows

I have a parent and child tables. The parent has seqno as the primary key and information as to who made a call and where and which form they used, etc. The child table has seqno from the parent, the question numbers on the form, and finally the answer to the question. A call is a visit to a store using a specific form which has a variable number of questions depending on the form number. I am looking for a way to find duplicate calls, done by the same person using the same form and the answers being the same. If a single answer is different the row does not qualify as duplicate. Is there a set based solution for this?

From your description, I shall assume the tables look like this:

seqno personid formid
 301    12      56
 302    16      75
 303    12      56

Answers pseqno question answer 301 1 'y' 301 2 'n' 301 3 'y' 302 1 'n' 302 2 'n' 302 3 'n' 303 1 'y' 303 2 'n' 303 3 'n'

Person 12 filled in form 56 twice, but the answers were different the second time, so you don't want to qualify call 303 as a duplicate of 301.

I would start by joining the combined data from the Answers and Calls tables to itself, based on the question number and also on the parent personid and formid, and with the added condition that the parent seqno values be different, because you don't want to compare a call to itself --

select something
  from ( Answers
    join Calls
      on cseqno = seqno ) X
  join ( Answers
    join Calls
      on cseqno = seqno ) Y
    on X.personid = Y.personid 
   and X.formid   = Y.formid 
   and X.question = Y.question
   and X.seqno    > Y.seqno

Notice that X.seqno is greater than Y.seqno -- if we had used "not equals" then since it's a self-join, each pair would come in twice. By selecting only X.seqno greater than Y.seqno, the later call will be considered the duplicate.

But what do we select? The answer is to count, for each call, the number of answers that don't match.

select X.seqno, ' is duplicated by ', Y.seqno
     , sum( case when X.answer = Y.answer
                 then 0 else 1
            end ) mismatches
  from ...
    by X.seqno, Y.seqno
having sum( case when X.answer = Y.answer
                 then 0 else 1
            end )
     > 0

If the answers are optional, such that one call might have an extra answer than another call for the same person for the same form, then you'll have to use an outer join and COALESCE in the CASE, and probably have to run both inequalities, X.seqno greater than Y.seqno and X.seqno less than Y.seqno. I didn't test that scenario.

For More Information

Dig Deeper on Oracle and SQL

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.