Can you do fuzzy matching with SQL? I want to retrieve a set of results based upon how closely they match to a certain string. For example, if the input string is SMITH, I want to retrieve all similar results, such as SMYTH, AMITH, SMITH, SMYTHE, etc., ideally with a measure of match closeness, e.g., 98%.
The answer is no, not really—but you can come close.
Most databases offer text searching using the familiar LIKE operator. Matches are exact (at best, collation- and case-insensitive), so this is not really a fuzzy search.
Many databases offer fulltext searching. MySQL's is fairy robust (if not always 100% accurate; it ignores words present in more than 50% of all rows, or words shorter than 4 characters). With an ordinary text search, a search for "big dogs" will not return a result such as "dogs with big noses" even though both words that were in the search phrase are present. Fulltext searching overcomes the exact match problem, and may even offer options such as setting a parameter for how "near" one word must be to another.
Fulltext searches may also involve stemming. If you search for "data modelling", stemming would allow the search to return results including "data models" because both "modelling" and "models" have the same stem word.
Some databases offer other special functions, such as the SOUNDEX function in SQL Server. This function would consider SMITH, SMYTH, and SMYTHE to be similar, but not AMITH. Perhaps this is sufficient for your needs.
True fuzzy searching takes into consideration the actual meaning of words. Do a search for "family name" and you should see results with "surname" included. I'm not aware of any database that has this capability in its SQL.
This was first published in October 2006