Friday, October 21, 2011

Excel Parse CityStateZip Functions

Excel How to:  Parse City-State-Zip fields with an Excel User-Defined functions (UDF). A single cell, such as "Boise, ID 83703" can be parsed into individual fields. A variety of city-state-zip formats, including Canadian, are supported.  These are really smart parsing routines. This article has been streamlined and re-published on this date, with newer versions of the Macro code.

Related Article:
Keyliner: CityStateZip DLL CSZ Class Library - a more sophisticated routine

Parsing city-state-zip fields is complicated because the data can be in a variety of different styles and formats. Consider these examples:

Boise ID 83703
Boise, ID 83703-1111
Boise, ID 83703 1111
Boise,ID 83703
Boise ID 837031111
Post Falls, ID 83703 (multiple-word city)
Post _ _ _Falls, ID _ _ _83703 (embedded, redundant spaces, punctuation)
Salt Lake City UT 12345
Vancouver BC V1B-101 (Canadian)
Vancouver Park BC V1B 101



The Method:
With the code referenced in this article, you can build your own Excel functions.  For example, you are probably familiar with the =Sum() function.  With Excel's macro language, you can write your own function, such as "=ReturnCityName()".

Even with no macro experience, can take the code from this article, add it to your spreadsheet, and use the functions, without having to write or debug the routines.  You should be able to download, apply, and use these routines in about 5 minutes.


Overview:

The macros were written as Excel UDF's (Excel User-defined functions), where you can create your own Excel keywords.  Once installed, the macros can be applied and used like any other function in Excel.  Today, I used these very routines to parse a 200,000 row spreadsheet.  All of this logic happens in under 500 lines of code.

Main Function Names
=ReturnCityName()
=ReturnState()
=ReturnZipCode() Returns entire zipcode, punctuated

=ReturnZipCode5()
=ReturnZipCode4()
=SuperTrim()  (Trims extra leading, trailing and internal spaces)
=ReturnLastWord()


In this illustration, note the single cell with a city-state-zip in cell A5. 
The formula in cell B5 demonstrates the "ReturnCityName".  Other cells show a parsed State and Zipcode.




The routines work by first identifying the zipcode, then backing up, looking for the StateCode. Anything before the State/Province Code is considered a City name. If a postalcode is not found, it looks for a state/province code. The code can return either the string-values or the numeric positions.


See this article for a tutorial on how to build and use User-Defined Excel Functions or you can follow these step-by-step instructions and use the code as-is. 

What it handles
These routines correctly process City-State-Zip (CSZ) strings with these general characteristics:
  • Any type of US or Canadian Postal Code
  • Including ZipCode-5 digits and Zip4
  • Zipcodes can be with or without punctuation
  • City-names can be one or more words
  • State-codes (ID, NY, BC, B.C., ONT), etc.
  • State Names (New York, North Dakota) are supported, if comma-delimited

What it does not handle
With multiple-word state names ('North Dakota', 'British Columbia'), you must have a comma after the city name.

Lincoln City, North Dakota 83700-1234

Single-word state-codes ('ND', 'BC', 'D.C.', 'Idaho', 'Missouri') are handled properly with or without a comma.

Canadian Postal Codes are a mystery to me and forgive me for not knowing them but I believe they are properly accounted for. The code can be punctuated with a hyphen or a space (K1A-1B2 or K01 1B2). The Province Name must follow the state-name rules described above (As an aside, the postal-code is identified by a numeric digit in the 2nd and 7th position, with a space or a hyphen in the middle. The last digit must be numeric in order to detect a US or Canadian postal code.


Installation and Use
Source-code can be downloaded from Keyliner's public GDrive site.  You are welcome to use this code in your personal or commercial projects.

1.  Download two files from this link:
(You do not need to register as a user to download.  Highlight the files, then click the 'Download' button on the top ribbon-bar)

Download Keyliner Excel Macros

Download two files; both are required:
A210_ParseCityStateZip.bas
A800_UtilStrings.bas


2. Save the downloaded files to any directory, such as

C:\Data\Source\CommonVB\A210_ParseCityStateZip.bas
C:\Data\Source\CommonVB\A800_UtilStrings.bas


3. Launch Excel and Enable Excel Macros:

Microsoft considers all macros a security risk and this can be a nuisance.
Do the following to enable the macros.  This is a one-time step for all macros.
  • Launch Excel; click the Orb (formerly the File Menu); choose Excel Options
  • Click Trust Center, Trust Center Settings
  • Macro Settings
  • Enable All Macros
  • Older versions of Office have similar settings.

4.  Open the sheet where you need to parse city-state-zip
or open a blank worksheet to test the routines.  In the Excel sheet, import the VB code using these standard Excel steps:

a.  Click the "orb", Excel Options 
b.  On the left, choose "Popular"
c.  Click [x] Show Developer tab in the ribbon and return to the sheet


5.  Start the VBA Macro Editor by pressing Alt-F11.
The VBA Macro Editor will open in a new window.


6. On the tree-diagram, illustrated below, select your sheet ("Book1"), then:
  • Other-Mouse-Click (Book1),
  • Choose "Import File"
  • Browse and select the previously-saved ".bas" module,
    e.g. C:\Data\Source\CommonVB\A210_ParseCityStateZip.bas
  • You will see the code appear under the "Modules" folder (not illustrated)
  • Repeat this step and import
    C:\Data\Source\CommonVB\A800_UtilString.bas  (the second macro file)

7. Close or minimize the Visual Basic editor by clicking the editor's "X".  This returns you to the sheet and the macros are ready to use. 

Note: All the macro editor did was "attach" the code.  You can optionally choose to view the programming code by using the editor. 

Also, the macros will save with the sheet, but only if you follow the Office 2007/2010 Save warning, documented near the end of this article.



Testing:

A. In Cell A1, type an example City-State-Zipcode for testing. Press Enter when done:

example:  Boise, ID 83703-1234


B.  In cell B1, type this formula:

=ReturnZipCode(A1)

Results: The Zipcode should return.

Try these other functions: 
"=ReturnState()"
"=ReturnCityName()"
"=ReturnZipCode5()"

Possible Errors:
If you get a "#NAME?" when typing =ReturnZipCode, the library was not linked in properly or was not linked to this particular sheet (the macros are only available to the sheets where they were copied).  Review the steps above.  If a routine, such as "SuperTrim" or "FindLastWordPos" fails, confirm you linked in the A800_UtilStrings library.



Excel 2007/2010 Save Warning:

When Excel 2007/2010 saves the sheet, it will save it as an ".XLSX" and will strip the macros out as it saves! (Microsoft is doing this as a security precaution.)



When prompted: 'Do you want to save as a macro-safe) workbook', click "Yes"  or optionally, do a File-Save-As, and choose "Excel Macro-enabled Workbook (*.xlsm).


Descriptions
This article does not explain how this code was developed; that has been left for your study. But the provided code should work well with any version of Microsoft Excel, from version 97 onward. This code has been tested with Office 2007. Converting the code to VB6 or VB.Net should be nearly transparent. All numeric position/counters are base-1, per VisualBasic's norm.

Functions within the A800_UtilStrings routine are of particular interest:
=SuperTrim()
Trims all duplicate interior, leading and trailing spaces. This function is potentially helpful in a wide variety of situations, above and beyond CSZ parsing.

=FindLastSpacePos()
Returns the position of the last space, base-1; returns a numeric value

=ReturnFirstWord()
=ReturnLastWord()
Returns the first and last word in a string; properly trimmed.

=FindStatePos()
Returns the State-code/State-name's numeric position in the string; base-1. Other functions return similar values:

=ReturnCityName()
=ReturnState()
=ReturnZipCode()
=ReturnZipCode4 ()

=GetCSZPositions

Returns 4 numeric values, showing the position of each of the CSZ's subparts. See the comments near the top of the code for a complete description.

Other functions, such as =FindState, return the numeric positions and are provided as a convenience.

Comments welcome.

Link
Related articles:
Excel Macros - Formatting Phone Numbers
Excel Macros to Parse City State ZipCodes
Excel Macros to return First Word, Last Word, Supertrim, Count Words
Writing Excel UDF functions
Excel VLookup - A complete Tutorial
Excel Coloring Alternate Rows
Excel Importing Leading Zeroes

1 comment:

  1. Keyliner now has an FTP Site (see links above) and you can download the code directly. This is recommended for all browser users because this preserves indentation and other formatting.

    ReplyDelete

Comments are moderated and published upon review.