I am one of your biggest fans and I really appreciate all your work for the Oracle community; I am new to Oracle and I have a problem I need help with.
I have two tables with two columns. They both have values that I need to find the similar values of. For example, say the first column has Jeff Shapiro clinical and the second column in the second table may have something like clinic of Jeff Shapiro.
How can I achieve that and find the similar values between the two tables-- is SQL better or PL/SQL? Do we use like contains or regexp? Will a domain index help here?
Wow! I have a fan! Aside from my family, that makes you pretty much my #1 fan.
The last time I saw this class of problem addressed was in the marketing industry for candies. Manual Data Entry injects all manner of “similar-but-different” strings, from differing abbreviations to misspellings. The algorithm was very complex and sophisticated, yet was fast enough (in C) to process millions of records.
- First, we parsed out all of the tokens.
- Then we normalized the tokens from a dictionary, converting abbreviations and shortenings to a standardized abbreviation. We threw away connector words like “the”. Our dictionary had each synonym and their standardized token, i.e., STICK<-STK,STIK,SK.
- If an exact match wasn’t located, we scanned the token one character at a time, granting a weight to each character position. If the sums of the weights fell within 95% of the nearest standardized keyword, we considered it a match. Otherwise, we tossed it to a human to discern the token. Humans are really good at taking incomplete information and discerning the completed pattern.
- Then the completed set of converted tokens could be analyzed for matches (and in no particular order, too).
The only thing I can add is, it isn’t going to be easy.
Have a question for Dan Clamage? Send an e-mail to email@example.com
Dig Deeper on Using Oracle PL-SQL
Related Q&A from Dan Clamage
A reader asks Oracle expert Dan Clamage about a single date condition mucking up a dynamic query. Continue Reading
Expert Dan Clamage explains how to use SQL SELECT and SQL UNION ALL statements to sort and visualize a set of sales figures. Continue Reading
One reader asks expert Daniel Clamage about the PL/SQL to_date and to_char functions and how to properly convert date and string values. Continue Reading