Q
Problem solve Get help with specific problems with your technologies, process and projects.

Find duplicate sets of child rows

I have a parent and child tables. The parent has seqno as the primary key and information as to who made a call...

and where and which form they used, etc. The child table has seqno from the parent, the question numbers on the form, and finally the answer to the question. A call is a visit to a store using a specific form which has a variable number of questions depending on the form number. I am looking for a way to find duplicate calls, done by the same person using the same form and the answers being the same. If a single answer is different the row does not qualify as duplicate. Is there a set based solution for this?

From your description, I shall assume the tables look like this:

Calls
seqno personid formid
 301    12      56
 302    16      75
 303    12      56

Answers pseqno question answer 301 1 'y' 301 2 'n' 301 3 'y' 302 1 'n' 302 2 'n' 302 3 'n' 303 1 'y' 303 2 'n' 303 3 'n'

Person 12 filled in form 56 twice, but the answers were different the second time, so you don't want to qualify call 303 as a duplicate of 301.

I would start by joining the combined data from the Answers and Calls tables to itself, based on the question number and also on the parent personid and formid, and with the added condition that the parent seqno values be different, because you don't want to compare a call to itself --

select something
  from ( Answers
   inner 
    join Calls
      on cseqno = seqno ) X
inner
  join ( Answers
   inner
    join Calls
      on cseqno = seqno ) Y
    on X.personid = Y.personid 
   and X.formid   = Y.formid 
   and X.question = Y.question
   and X.seqno    > Y.seqno

Notice that X.seqno is greater than Y.seqno -- if we had used "not equals" then since it's a self-join, each pair would come in twice. By selecting only X.seqno greater than Y.seqno, the later call will be considered the duplicate.

But what do we select? The answer is to count, for each call, the number of answers that don't match.

select X.seqno, ' is duplicated by ', Y.seqno
     , sum( case when X.answer = Y.answer
                 then 0 else 1
            end ) mismatches
  from ...
group 
    by X.seqno, Y.seqno
having sum( case when X.answer = Y.answer
                 then 0 else 1
            end )
     > 0

If the answers are optional, such that one call might have an extra answer than another call for the same person for the same form, then you'll have to use an outer join and COALESCE in the CASE, and probably have to run both inequalities, X.seqno greater than Y.seqno and X.seqno less than Y.seqno. I didn't test that scenario.

For More Information


This was last published in January 2003

Dig Deeper on Oracle and SQL

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close