Q
Problem solve Get help with specific problems with your technologies, process and projects.

Match two words in a column

I have two tables, and both have a column called full_name. I want to match at least two words in the full name column. For example, "John Alder Smith" and "John F Smith" is a match, while "Peter Duncan Doyle" and "Peter Parker" is not a match.

I have two tables, and both have a column called full_name. I want to match at least two words in the full name column. For example, "John Alder Smith" and "John F Smith" is a match, while "Peter Duncan Doyle" and "Peter Parker" is not a match. Thanks for your help.

The SQL part of the solution to this problem is to use a cross join, and then a WHERE clause to do the match:

```select t1.full_name as t1_name
, t2.full_name as t2_name
from table_one as t1
cross
join table_two as t2
where t1.full_name resembles t2.full_name ```

Why a cross join? Just to acknowledge that we are comparing every name in table one to every name in table two in order to find matches. At least, that's what the problem sounded like to me.

Now, about this mysterious resembles operator. Of course, there is no such thing, at least not tailor-made to match at least two words in two columns. Our challenge now is to find a way to do this with SQL.

The tough part is breaking up the name into words. For the general case, an unknown number of words, we might employ an auxiliary integers table, but let's assume that for full names, four words (names) is a good working maximum, as this simplifies the query a bit.

```select t1.full_name as t1_fullname
, t2.full_name as t2_fullname
from table_one as t1
cross
join table_two as t2
where case when ' '||t1.full_name||' '
like '% '||word(t2.full_name,1)||' %'
then 1 else 0 end
+ case when ' '||t1.full_name||' '
like '% '||word(t2.full_name,2)||' %'
then 1 else 0 end
+ case when ' '||t1.full_name||' '
like '% '||word(t2.full_name,3)||' %'
then 1 else 0 end
+ case when ' '||t1.full_name||' '
like '% '||word(t2.full_name,4)||' %'
then 1 else 0 end
>= 2```

Each CASE expression compares a separate word from t2.full_name, to the complete t1.full_name. Each LIKE comparison consists of two terms, with a space appended to both the front and back of each term being compared, like this:

`   ' Peter Duncan Doyle ' like '% Peter %'`

We need spaces around the name Peter in the right term, '% Peter %', because we don't want to match Peterson. We therefore also need to append a space to both the front and back of the left term, the entire full name, in order to find a word at the beginning or end of the full name.

Thus we test the first four words of t2.full_name, and for every word found within t1.full_name, we add 1 to a total. And if this total is 2 or more, the full names are considered to match.

The only thing we haven't done yet is explain how to extract the separate words out of t2.full_name. As you probably guessed, there is no WORD function. Depending on your database system, you might use some combination of nested POSITION and SUBSTRING functions, to extract words based on how many spaces you detect in the full name going from left to right. Granted, by the time you get to the CASE expression for the fourth word, with POSITION and SUBSTRING functions nested four deep, it does get ugly. For this reason, look to see if your database system offers any other string handling functions to make this part easier. For example, MySQL has the SUBSTRING_INDEX function, which can make this task easier. But if your database system allows you to declare a user defined function, then you could write your own WORD function and use it exactly as shown above.

This was last published in October 2005

Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

SearchDataManagement

• Chief data officer role: Searching for consensus

The chief data officer role is about many things -- regulations, innovation, AI and more. Consultant Randy Bean discussed the ...

• How graph data modeling can help evaluate database tools

Mapping data to a graph model can be challenging -- but it can also help an organization create prototypes to evaluate graph ...

• eHarmony hooks up with Redis NoSQL database for hot storage

The Redis key-value store finds use in a system to match would-be romantic partners on dating site eHarmony, which employs a ...

• Heat map view sets table for food warehouse optimization

Inspired by the vivid views of stadium heat maps, a Midwest food distributor worked with Information Builders to gain a better ...

• Streamlining predictive analytics in retail marketing

Online flash-sale retailer Zulily uses BigQuery and Tableau to help power its predictive analytics, which, in turn, boosts its ...

• Airbnb, Univision highlight best practices in BI

At the Real Business Intelligence conference, Airbnb and Univision execs presented some of the BI strategies their organizations ...

SearchSAP

• On-premises, hosted most popular S/4HANA deployment options

The pure cloud -- SaaS -- version of SAP's newest ERP, S/4HANA Cloud, lacks some of the same features of the on-premises version....

• S/4HANA public cloud version can get lost in cloud confusion

The 'true' public cloud is the streamlined SaaS version of on-premises S/4. But private cloud options are often conflated with ...

• SAP S/4HANA migration: What you need to know

There's a lot to consider when contemplating a move to SAP S/4HANA, and this essential guide provides a starting point, including...

SearchSQLServer

• A quick tutorial on SQL Server maintenance plans

SQL Server maintenance plans get a bad rap, but for DBAs who need a simple way to maintain databases, Microsoft's built-in tools ...

• Proposed Microsoft-GitHub buy confirms open source role in cloud

The looming Microsoft-GitHub pairing confirms the company's rebirth as an open source friend. Data tools on the Azure cloud are ...

• Common Data Service for Analytics eases Power BI integration

Integrating data into Power BI for analysis can be a challenge, but Microsoft's Common Data Service for Analytics technology is ...

TheServerSide.com

• Attain Jenkins Git integration with a GitHub pull request

This Jenkins Git integration tutorial demonstrates how to create a freestyle build job that performs a Jenkins GitHub pull ...

• Financial firms, vendors push self-service software delivery

Self-service DevOps automation appeals to enterprises that must push out new code as they adapt to changing requirements.

• IT projects and software teams need to include Agile people

Not every idea deserves equal weight in a software development project, but Agile people know that garnering input from a wide ...

SearchDataCenter

• Rackspace colocation program hosts users' legacy servers

Rackspace now has a managed colocation program that it hopes to upsell its customers with additional services, once their servers...

Broadcom has acquired CA Technologies in a move some believe is largely financially motivated, while others see an opportunity ...

• Ten Linux process management commands that simplify admin workflows

If you work in Linux, chances are you have to do some process management. Here are some commands to simplify that workflow.

SearchContentManagement

• Endpoint security tool fueled OpenText's Guidance Software acquisition

Endpoint security was the primary draw for OpenText's Guidance Software acquisition. But plans to improve e-discovery and data ...

• Digital transformation benefits follow a not-so-fast track

Choosing among the many digital transformation strategies in the content management sphere is not easy but can pay off when ...

• Customers, vendors differ on digital transformation definition

Digital transformation may be the talk of the analyst sphere and the marketing domain, but people using the technology aren't ...

SearchHRSoftware

• Automated recruiting solves Groupon's sourcing talent woes

Building a talent pool through effective sourcing is a major effort by Groupon. It is using a recruiting automation tool to find ...

• New HR tools for hourly workers, employee retention announced

This week's news roundup includes an HR tool designed just for hourly workers, a new offering from Limeade to help with talent ...

• Eight human capital management functions every HR department needs

Employee self-service and wellness portals are no longer enough. Now, you need a multipronged strategy that tackles the most ...

Close