Problem solve Get help with specific problems with your technologies, process and projects.

How can I find similar values in two Oracle tables in PL/SQL?

Looking to find similar values in two Oracle tables in PL/SQL? Learn about how to do so with this tip from Oracle PL/SQL expert Dan Clamage.

I am one of your biggest fans and I really appreciate all your work for the Oracle community; I am new to Oracle and I have a problem I need help with.

I have two tables with two columns. They both have values that I need to find the similar values of. For example, say the first column has Jeff Shapiro clinical and the second column in the second table may have something like clinic of Jeff Shapiro.

How can I achieve that and find the similar values between the two tables-- is SQL better or PL/SQL? Do we use like contains or regexp? Will a domain index help here?

Wow! I have a fan! Aside from my family, that makes you pretty much my #1 fan.

The last time I saw this class of problem addressed was in the marketing industry for candies. Manual Data Entry injects all manner of “similar-but-different” strings, from differing abbreviations to misspellings. The algorithm was very complex and sophisticated, yet was fast enough (in C) to process millions of records.

  •  First, we parsed out all of the tokens.
  • Then we normalized the tokens from a dictionary, converting abbreviations and shortenings to a standardized abbreviation. We threw away connector words like “the”. Our dictionary had each synonym and their standardized token, i.e., STICK<-STK,STIK,SK.
  •  If an exact match wasn’t located, we scanned the token one character at a time, granting a weight to each character position. If the sums of the weights fell within 95% of the nearest standardized keyword, we considered it a match. Otherwise, we tossed it to a human to discern the token. Humans are really good at taking incomplete information and discerning the completed pattern.
  • Then the completed set of converted tokens could be analyzed for matches (and in no particular order, too).

The only thing I can add is, it isn’t going to be easy.

Have a question for Dan Clamage? Send an e-mail to editor@searchoracle.com


Dig Deeper on Using Oracle PL-SQL

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.