Problem solve Get help with specific problems with your technologies, process and projects.

Analysis of intersection frequency

I need to perform one or more queries to do the following. I am trying to get some data on how often two fields in a record are duplicated. For example:

Field 1 (Doctor), Field 2 (Prescription)

(Please note Field 1 and Field 2 are in different tables).

I need to get result set that tells me how often a given doctor prescribes a particular drug. In other words, I want to know if the same doctors are always prescribing the same drugs or not. I don't need to know what all the doctors are doing, but the top 50 or so who have a pattern of prescribing the same drugs over and over would be a step in the right direction.

I could do a separate query for all Doctors x Drugs but clearly this means lots of queries, which means lots of time = not good. Is there a better way???

Separate queries for all Doctor-Drug combinations are not necessary. You said that Doctor and Prescription are in different tables. Presumably the Prescription table will have a field to identify which drug was prescribed, and this will likely be a foreign key to the Drug table, where the Drug name is stored. Similary, the Prescription table will have a foreign key to the Doctor table, where the Doctor name is stored. Thus the Prescription table is an intersection table, implementing the many-to-many relationship between Doctors and Drugs.

So the first thing to do is select doctor-drug combinations in descending order from the Prescription table --

select DoctorID, DrugID
     , count(*) as NumPrescriptions
  from Prescriptions
group by DoctorID, DrugID
order by NumPrescriptions descending

The first time you run this, you'll get back all doctor-drug combinations. If you're using SQL/Server or Access, you can use the TOP keyword (or if you're using MySQL, the LIMIT keyword) to select the top 50 combinations.

What next? You could do joins to the Doctor and Drug tables to get the names of the doctors and the names of the drugs, since the DoctorID and DrugID keys are probably not useful on their own. But that three-way join might run very slowly.

The trick here is to save the results of the "top" query into a temporary table. This will make further queries a lot more efficient.

create table TopPrescriptions
   ( DoctorID integer
   , DrugID integer
   , NumPrescriptions integer )

insert into TopPrescriptions select top 50 DoctorID, DrugID , count(*) from Prescriptions group by DoctorID, DrugID order by NumPrescriptions descending
select DoctorName, DrugName , NumPrescriptions from Doctors, Drugs , TopPrescriptions where Doctors.DoctorID = TopPrescriptions.DoctorID and Drugs.DrugID = TopPrescriptions.DrugID order by DoctorName, DrugName

Notice that grouping is done only on the Prescriptions table (instead of in a three-way join). The subsequent join of the TopPrescriptions table to the Doctor and Drug tables will be very efficient. Further, you can re-sort the results easily.

For More Information

  • What do you think about this answer? E-mail the edtiors at editor@searchDatabase.com with your feedback.
  • The Best SQL Web Links: tips, tutorials, scripts, and more.
  • Have an SQL tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize. Submit your tip today!
  • Ask your technical SQL questions -- or help out your peers by answering them -- in our live discussion forums.
  • Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, object-oriented and data warehousing gurus are waiting to answer your toughest questions.

Dig Deeper on Oracle and SQL

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.