Saturday, June 1, 2013

Excel Parse First Name, Last Name

How To: Reliably Parse FirstName, LastNames in Excel using a User Defined Function (UDF Macro). A single cell may contain a person's full name, as in "John Q. Smith".  Use Excel to parse out the constituent parts: FirstName, LastName, MidName, Suffix and the salutation.  New version 2013.06.11 includes support for prefix-Salutations and multiple suffixes + minor bug fixes.

Preamble:
Parsing a person's name into sub-parts is difficult. There are issues when the person's first-name is an initial plus their second name, as in "J. Alan Smith".   Other names, such as Last Names, may be composed of two words, as in "Alan Van Doran", "Bobby Mc Farland" or "Jill Saint John".  A standard parsing routine may confuse "Van", "Mc" and "St." as middle-names and would assign only the "J" to J Alan's first name.

The macros referenced in this article see through all types of naming issues and are able to parse most (English) names into their constituent parts:

FirstName
LastName
MidName
Suffix (Jr., Sr., PH.D, III, MD, etc.)
Prefix / Salutations (Mr., Mrs, Senator, Cpt., The Right Reverend, etc.)


What it Handles:

These routines correctly interpret all manner of English (and likely other language) names, including those with:
  • First-letter initials, punctuated or not  (J. Smith, J Smith)
  • First-letter initials, with a second name assumed as first= ("J. Alan")
  • Two-letter firstnames (JP Morgan)
  • Two-letter first names plus mid-names (JP Alan Morgan)
     
  • Hyphenated first and last-names
  • Names, such as Van, Vander, Mc, Mac, Saint, Le, Le', De, etc.
  • Common Suffixes (Jr., Sr., "J. Smith III", PH.D, PHD, MD, M.D., DDS, etc)
  • Common Suffixes, punctuated with/without commas (J. Smith, Jr. returns "Smith")
  • Multiple-word suffixes, punctuated with at least one comma are detected properly
     
  • Names with extra embedded / internal spaces
  • Three, four and five-word names (Mary-Anne Lynn Miller, etc.)
  • Single-word names ("Smith", "Cher", assumes LastName)
  • Can separate common and uncommon Salutations, such as Mr., Mrs, Lt., Fr., Reverend, etc
  • All names are returned in the upper/lower case, as typed.

Where this routine struggles:

There are some areas where even humans have a hard time interpreting.
  • Names such as "Mary Ann Smith", where "Mary Anne" is the first name.  Ann will be tracked as a middle-name. 

    However, "Mary-Ann Smith" will parse Mary-Ann as a first name.  If you have control over the original data and can insert a tilde~, "Mary~Ann Smith" will parse as "Mary Ann" as the first name, with the tilde removed.  This is a programming trick which might be useful.
     
  • Similarly, "Kathy Smith Jones" will parse as a mid-name "Smith" and a last-name "Jones". 

    Hyphenated last names, such as "Kathy Smith-Jones", will parse correctly with the last name as "Smith-Jones". 
     
  • Common multiple-word suffixes, without a leading comma are detected with less accuracy.  Uncommon suffixes, such as many educational-degrees, are not detected and will be incorrectly treated as last-names.  However, this logic is easily modified.  I would appreciate a comment with all of the possible suffixes.
Other comments:

If the source last name has a special character (~ tilde), "Kathy Smith~Jones", the last name will be parsed as "Smith Jones" (with the tilde removed) - this is a programming trick that may help you if you have control over the name assembly. 

If the name has a middle-initial, the last name will always parse correctly, no matter how long or how poorly punctuated, as in "Kathy Q. Smith Jones".  Middle initials are detected as single-character values, with or without punctuation (Q, Q.).



Here are some normal and unusual names this routine can handle:

John Smith
John Q. Smith
John Q Smith, Jr.
John Q Smith,Jr  (No space after comma)
John Q. Martin Smith, III  (simple suffix)
John Martin Jr. (simple suffix, no comma)
John Martin, Jr. MD  (suffix with leading comma: "Jr. MD")


Smith (Missing first and midnames, detects as last name)
J Smith
J. Smith
J. Alan Smith
J Alan Smith
J.P. Smith  (first-name = "J.P.")
JP Albion Smith
JP Albion Smith PHD
 
John Mac Neal (LastName = Mac Neal)
John Mc Neal (LastName = Mc Neal)
John O' Brian (LastName = O' Brian, O' Leary, etc.)
   John  Mc   Neal MD  (extra embedded spaces)

   (various Le', De' and other such names, with/without punctuation)
John Van Nuys  (LastName = Van Nuys, also Vander, Von, etc.)
Mac Hetherington (LastName = Hetherington, not Mac Hetherington)

Mary-Anne Jacob Smith MD

Tom Smith, M.D.
Mary Ann Nichole Smith (Ann as middle name; this may or may not be correct)
Mary Q Anne Nichole Smith Barney (a mess. LastName = Anne Nichole...Barney)
Mac Von Nuys, Jr (lastName = Von Nuys)

Lori Vander Hoff (LastName = Vander Hoff)


Numerous Prefixes are detectable:
The Hon Rev Thomas Smith  (First name Thomas, Last name Smith)

Cpt. John Smith
Mr. Allen Jones


Overview:

With the code referenced in this article, you can write your own Excel functions.  For example, you may be familiar with Excel's =Sum().  With Excel's macro language, you can write a function, such as "=ReturnLastName()", essentially inventing your own keywords.  These are called User Defined Functions (UDF). 

Even with no macro experience, the code from this article can be downloaded and added to your spreadsheet, ready for use, in under 5 minutes.  This can be done without writing a line of code. Once installed, the macros can be applied and used like any other Excel function.  Today, I used these very routines to parse a 200,000 row spreadsheet.

See this article for a tutorial on how to build and use User-Defined Excel Functions or follow these step-by-step instructions and use the code as-is.  
    Functions:

    =ReturnSalutation()
    =ReturnSuffixName()
    =ReturnFirstName()
    =ReturnLastName()
    =ReturnMidName()

    Also referenced are utility functions, such as "=SuperTrim" and "=ReturnLastWord", which are found in the A800_UtilStrings.bas module, downloaded below.



    Installation and Use

    Source-code can be downloaded from the Keyliner public Gdrive site  (the code is too ungainly to post in a blog).  You are welcome to use these routines in your personal or commercial projects; as a courtesy, please leave the author comments in the code.

    1.  Download two files from this link:
    You do not need to register or login in order to download.  Highlight the files, then click the 'Download' button on the top ribbon-bar.

    From this link, download these two files; both are required for this article: 
    Keyliner Public Gdrive - Excel Macros

    A240_ParseNames.bas
    A800_UtilStrings.bas


    2. Save the downloaded files to any directory, such as

    C:\Data\Source\CommonVB\A240_ParseNames.bas
    C:\Data\Source\CommonVB\A800_UtilStrings.bas

    (or any other directory of your choosing)

    3. Launch Excel and Enable Excel Macros.

    Microsoft considers all macros a security risk and this can be a nuisance.
    Do the following to enable the macros.  This is a one-time step for all macros.
    • Launch Excel; click the Orb (formerly the File Menu); choose Excel Options
    • Click Trust Center, Trust Center Settings
    • Macro Settings
    • Enable All Macros
    • Older versions of Office have similar settings.
    See also this article, which summarizes these steps:  Enable Excel Developer Options

    4.  Open the sheet where you need to parse Names or open a blank worksheet to test the routines.

    5.  Expose Developer Options

    Office 2008
    a.  Click the "orb" (or File Menu), Excel Options 
    b.  On the left, choose "Popular"
    c.  Click [x] Show Developer tab in the ribbon
    d.  Click OK and return to the sheet

    Office 2010
    a.  Click File, Excel Options, "Customize Ribbon"
    b.  In the second-column ("Customize Ribbon: Main Tabs"), check [x] Developer
    c.  Click OK and return to the sheet


    6.  Import the Downloaded Excel Macros
    a.  Within the sheet, start the VBA Macro Editor by pressing Alt-F11.  The VBA Macro Editor will open in a new window.

    b. On the tree-diagram, illustrated below, select your sheet ("Book1"), then:
    • Other-Mouse-Click (Book1),
    • Choose "Import File"
    • Browse and select the previously-saved ".bas" module,
      e.g. C:\Data\Source\CommonVB\A240_ParseNames.bas
    • You will see the code appear under the "Modules" folder (not illustrated)
    c.  Repeat the steps directly above
    • Import C:\Data\Source\CommonVB\A800_UtilString.bas  (the second macro file)

    7. Close the Visual Basic editor by clicking the editor's "X". 

    This returns you to the sheet and the macros are ready to use. 

    Note: All the macro editor did was "attach" the code.  You are welcome to review the code from within the editor. 

    Important: The macros will not save with the sheet unless you follow the Office 2007/2010 Save warning, documented near the end of this article.




    Initial Testing

    A.  Type a name "John Q. Smith" in cell A1.  Press Enter to commit.
    B.  In Cell B1, type this formula:

    =ReturnFirstName(A1)

    Note:  You are not "running" these macros; they are executed as a function within the sheet.

    You are done!
     

    Note these other routines:
    =ReturnFirstName
    =ReturnMidName
    =ReturnLastName
    =ReturnPrefix
    =ReturnSalutation

    Also of interest, these general purpose utilities:
    =SuperTrim (very cool)
    =ReturnFirstWord
    =ReturnLastWord (and others)



    Dependencies:

    The A240_ParseNames macros has an inter-dependency, where the =ReturnMidName routine depends on the =ReturnFirstName and =ReturnLastName.  In other words, you cannot parse the Mid name without first parsing the First and Last names.  This was done for efficiency and saves those routines from re-parsing previously parsed fields.

    Because of this, build your parsing fields in this recommended order, with these formulas (technically, fields can be in any order, but the MidName formula may end up pointing to a column to the right - which is okay.).  See below for parsing prefix Salutations.

    1.  Parse the Suffix first (this properly detects missing or blank suffixes)
    =ReturnSuffixName(celladdress)

    2.  Parse the FirstName next, using
    =ReturnFirstName(celladdress)

    3.  Parse the LastName next, using
    =ReturnLastName(celladdress)


    4.  Parse the MidName last, using
    =ReturnMidName(celladdress, firstname, lastname)
    Note the inter-dependencies; this field uses two other fields as input

    For example, with a name typed in cell B3 (John Q. Smith, Jr.), build the =ReturnSuffixName formula in cell C3.  Continue with the other formulas, finishing with the =ReturnMidName in cell F3, as illustrated:

    Click illustration for a larger view; click right-X to return

    Once the formulas are built, you can "fill-down" columns C, D, E, and F, to calculate other names in the list. Commonly, once parsed, you would copy all of the names and Paste them using "Paste Special, Values only" -- converting them from formulas to fixed text.

    If you were inclined to look under the hood, you would see the ReturnMidName macro's signature line in this VBA code.  Note the three fields passed into the routine:

    Click illustration for a larger view; click right-X return


    Prefix Salutations Dependencies

    Although uncommon in most listings, you can parse prefixed-salutations from a cell ("Mr, Mrs., Dr., etc).  An ending space, before the first name is required as a delimiter.  These routines properly handle missing prefixes.

    Examples:

    Mr. John Q. Smith
    Mr John Smith, Jr.
    Reverend John Smith
    The Hon Mrs. Mary Anne Jones
    Pvt. Mary Horne


    Common prefixes, such as Dr., Dr, Mr., Mr, are supported with and without periods.  But most multi-word honorifics, such as "The Hon. Mr." do not check for full punctuation (for example, two periods are not tested).  If needed, this behavior is easily changed.

    =ReturnSalutation( )
    The ReturnSalutation verb could not be efficiently incorporated into the original =ReturnLastName, =ReturnFirstName functions, and as-such, it was written as a stand-alone module within the A240 library.  Because of this, you will not be able to run =ReturnLastName, =ReturnFirstName routines against a prefixed name.  The salutation must be removed first.  This is easily solved.  Consider this scenario:

    Column A:  "The Hon Mr. John Q. Smith"

    Column B: =ReturnSalutation(A)   Resulting in "The Hon Mr."

    Column C: =if (Len(B)=0, C, Mid(A, Len(B) + 2, 999))    Resulting in "John Q. Smith";  the +2 skips the trailing space and starts at "J" (John).

    For all LastName, FirstName routines, point them to the non-prefixed Column C.



    This concludes the tutorial on how to use these parsing routines. 


    Possible Macro Errors:
    If you get a "#NAME?" when typing =ReturnLastName, the library was not linked in properly or was not linked to this particular sheet (the macros are only available to the sheets where they were copied).  Review the steps above.  If a routine, such as "SuperTrim" or "FindLastWordPos" fails, confirm you linked in the A800_UtilStrings library.


    Possible Errors:
    Compiler Error, Variable not found
    Starting in Excel 2010, Microsoft is much more particular about having to declare variable types, especially if "Option Explicit" is used or if, in the VBA Editor, Tools, Options, [Editor], [x] Require Variable Declarations.  The Author's code worked properly in Office 2007 and earlier, but failed in 2010.  New versions of the code has been uploaded, fixing this problem.


    Excel 2007/2010 Save Warning:

    When Excel 2007/2010 saves the sheet, it will save it as an ".XLSX" and will strip the macros out as it saves (Microsoft is doing this as a security precaution.)



    When prompted: 'Do you want to save as a macro-safe) workbook', click "Yes"  or optionally, do a File-Save-As, and choose "Excel Macro-enabled Workbook (*.xlsm).

    Caution:
    If you modify the Macros and save the sheet, only this sheet will see the changes.  From the macro editor, re-highlight the A240 and A800 routines and "Export" them back to the 'C:\data\Source\CommonVB' routines, where they were originally stored.  This way, the changes can be used in other spreadsheets.


    Related articles:
    Excel Macros: Formatting Phone Numbers
    Excel Macros to Parse City State ZipCodes
    Excel Macros to return First Word, Last Word, Supertrim, Count Words
    Writing Excel UDF functions
    Excel VLookup - A complete Tutorial
    Parsing Salutations and Salutation lists

    Excel Coloring Alternate Rows
     
    Enable Excel Developer Options

    Originally published: 2011.10.22

    4 comments:

    1. I can't see the macro's to start them. Blank macro window even though I have imported them and can look at the script

      excel 2010

      thanks for any suggestions

      robertlcolby@gmail.com

      ReplyDelete
    2. CoreEquity: If I understood your message correctly, the macro-code was pasted properly.

      These macros are not ones you "Run" from the Macro-Run window. instead, they are called as a formula in a cell, such as =ReturnLastName.

      Hope this helps.

      ReplyDelete
    3. Oh, and don't forget to click the file-menu (orb), "Excel Options", Trust Center, Trust Center Settings, and Enable all macros.

      ReplyDelete
    4. Multiword suffixes are now support, provided they have a leading comma. For example: John Q. Smith, Jr. MD Esq. will parse "Jr. MD Esq" as the suffix. A240 updated on 2013.06.10.

      without the comma, suffixes only detect common (Single-word) suffixes, as in John Q. Smith III (detects "III" properly). Similarly, John Q. Smith Jr. will work.

      I would still appreciate a comprehensive list of suffixes, including multi-worded; I could improve the routine further with this information.

      ReplyDelete

    Comments are moderated and published upon review.