I have a worksheet where column A has various names in varying formats:
A1 John Smith
A2 Jones, Mary
A3 Sally Gomez
A4 The Gonzalez family
Column B has similar data:
B1 The Smith Family Trust
B2 Bob and Mary Jones
B3 Blackwell, John
B4 Luz Gonzalez
I would like to identify the instances where the same last name is found in column A and column B. In the examples above, the formula, if placed in column C, would result in
C1 TRUE (because "Smith" is found in both A1 and B1)
C2 TRUE (because "Jones" is found in both A2 and B2)
C3 FALSE (because there are no common words between A3 and B3)
C4 TRUE (because "Gonzalez" is found in both A4 and B4)
Is this even possible?
- 31
4 Answers
Given your comments as well as your question, it seems you want to return TRUE if any word in one phrase matches a word in the adjacent phrase. One way to do this is with a User Defined Function (VBA). The following excludes any words that are in arrExclude, which you can add to as you see fit. It will also exclude any characters that are not letters, digits or spaces, and any words that consist of just a single character.
See if this works for you.
Another option would be take a look at the free fuzzy lookup add-in provided by MS for excel versions 2007 and later.
To enter this User Defined Function (UDF), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=WordMatch(A1,B1)
in some cell.
EDIT2: Find Matches segment changed to see if it works better on Mac
Option Explicit
Option Base 0
Option Compare Text
Function WordMatch(S1 As String, S2 As String) As Boolean
Dim arrExclude() As Variant
Dim V1 As Variant, V2 As Variant
Dim I As Long, J As Long, S As String
Dim RE As Object
Dim sF As String, sS As String
'Will also exclude single letter words
arrExclude = Array("The", "And", "Trust", "Family", "II", "III", "Jr", "Sr", "Mr", "Mrs", "Ms")
'Remove all except letters, digits, and spaces
'remove extra spaces
'Consider whether to retain hyphens
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "[^A-Z0-9 ]+|\b\S\b|\b(?:" & Join(arrExclude, "|") & ")\b"
.Global = True
.ignorecase = True
End With
With WorksheetFunction
V1 = Split(.Trim(RE.Replace(S1, "")))
V2 = Split(.Trim(RE.Replace(S2, "")))
End With
'Find Matches
If UBound(V1) <= UBound(V2) Then
sS = " " & Join(V2) & " "
For I = 0 To UBound(V1)
sF = " " & V1(I) & " "
If InStr(sS, sF) > 0 Then
WordMatch = True
Exit Function
End If
Next I
Else
sS = " " & Join(V1) & " "
For I = 0 To UBound(V2)
sF = " " & V2(I) & " "
If InStr(sS, sF) > 0 Then
WordMatch = True
Exit Function
End If
Next I
End If
WordMatch = False
End Function
EDIT: Here is a screenshot of the results, using both your original examples, and also the examples you gave in a comment below where you indicated you were having a problem.

- 9,461
The most difficult part of this exercise is determining what, in column A, constitutes a last name. In your example, it's either:
- The first word, if there's a comma in the whole name
- The second word
If that rule is true, then you can just do a formula like this:
=NOT(ISERROR(FIND(last_name, B1:B4)))
The formula to actually determine the last name is a little more complex. You essentially have to figure out what character positions the spaces are in, and then pull the letters in between. There's a good explanation on this thread:
- 143
Highlight both columns > conditional formatting (home tab) > highlight cell rules > duplicate values.
This will highlight all duplicates in both columns.
Make sure you are highlighting the columns and not the cells.
- 1,725
- 4
- 16
- 28