2024-01-21

CityStateParse - Parse single-line Addresses into CityStateZip

CityStateParse - Parse single-line ASCII street addresses into separate Address-City-State-Zip fields, writing results to a tab-delimited text file.

The mailing addresses can be punctuated with commas, or not, and it sees through misspellings and other typos.  The program can be run interactively or in an unattended batch mode.

For example:

"123 North Elm Street Suite 102 Boise ID 83701"

is parsed and returned as:
Address:  123 N. Elm Street Suite 102
City: Boise
State: ID
Postal Code: 83701

CityStateParse.exe is a Windows program that can work with both punctuated and un-unpunctuated  addresses.  It can navigate misspelled, poorly-delimited, and otherwise crappy addressing data.  It does not rely on postal-code lookup tables.  

Using an internal algorithm, it snakes through the line and teases-out the proper fields.  Using terminology from the 1990's, it uses "fuzzy logic."  To be more modern, I can joke and say it uses "AI."

With the on-screen panel, you can play with and experiment with single addresses, parsing one at a time, viewing the results on-screen. 

More commonly, feed it a larger ASCII transaction file.  In a few seconds, results are returned as a tab-delimited file that can be read with Excel or other programs.  The batch conversion can be run interactively or batch. 




Formerly, this program crunched only CityStateZip fields (CSZ), but now handles CSZ plus addresses.  This illustration gives an indication of some of the issues it can dance around.


With street addresses, it correctly parses a litany of St. (Saint), St, Street, 
Ave, Blvd, Court, Cir, Drives, as well as Suites, Ste., Apt, Units, etc.
It sees many misspelled states, such as Missisippi, Illinoise, Louiseiana, and others.  


123 N. Elm St. St. Marys ID 83701  (street vs saint)
224 E. Oak St. St. Martin's City Wash. 83701

(There are a lot of cities named "Saint this and Saint that" and this program can figure all of them out.)

Results:
a tab-delimited file with these columns:
Address
City
StateCode
PostalCode
Country
 (if present)

Running in Batch:
To parse multiple addresses -- thousands of addresses -- place them in an ASCII text file, one address per line.  From the panel, use the 'Alternate Input File' field (third field on panel), and click "Parse". 

(must be a text file, with one address (aCSZ) per line, CRLF between records, no tabs)


Or pass the input file as a command-line parameter: 

CityStateParse.exe  C:\temp\myaddresses.txt 

When passed as a command-line, the program loads, parses, writes the results, then closes.  The output file is named with a date and arrives in the same directory as the source.  The program must have 'write' permissions to the folder.

"CityStateParse" is a keyliner-developed program that is free for all personal, commercial, and government use.

There is no registration
No email
No nagging
No advertisements
No spying
No cost

------------
 
A.  Download Link
:

From Keyliner's public GDrive, click this link and download to a temp or download directory.  Do not download directly into ProgramFiles. 

Download Link:

CityStateParse.exe Download
- Version 2.03 (2024.02).  Contains standalone .exe. 

CityStateParse.exe MD5 check:

MD5:        5a-37-34-a4-5f-53-47-fb-4e-eb-17-b0-8f-f0-28-3e
SHA256:  abf621944eb8df2d8386fb2ee1383cdf60b403b2c353620ff98b003979cf6034
Len:     169K

When downloading, different browsers behave differently.
You will be prompted the file cannot be scanned.  Click "Download anyway"

Microsoft Edge:
Prompts "CityStateParse.exe" was blocked because it could harm your device (it is an .exe from the Internet). 

"Click See More" and allow the download. 

With Microsoft Edge, the downloaded file appears in your Downloads directory with a random name, such as "Unconfirmed 780359.crdownloaded" (name varies).  Use File Explorer to rename the file to "CityStateParse.exe".

(keyliner.com cannot afford a code-signing certificate, which would help this situation.)

B.  Mark the program as safe-to-run:

(This step may not be needed if downloaded by Edge and you clicked "More / Download Anyway")

Using File Explorer, right-mouse-click the downloaded (and re-named) .exe 

Select "Properties"
Check [x] Unblock.  This removes the "mark of the web." 

                 Click for larger view

* Only do this if you trust keyliner *and* only if downloaded from keyliner's public GDrive.   

If "Unblock" is not visible, it has already been unlocked (by Microsoft Edge).
Once [x] Unblocked is clicked, this security menu disappears.

The program is ready to run
a) Launch one-time (from the download directory) to generate a control.ini file (CityStateArrays.ini).

b) Move the .exe and the .ini file to any directory of your choosing, typically C:\Program Files\Util

Installation: First-time Run:

The download does not need to be installed. 
Download the .exe and run.

For the first-time run, CityStateParse.exe, must be run from a directory where users have create rights.  On this first run, the program builds a control-INI file, which contains various city-state rules (or download separately).

To install, download the .exe and place in any temp folder on your hard disk
Launch the program once to create the control.ini file
Move the executable plus the .ini to a location of your choosing, such as C:\programFiles\Util

The Control .ini File:
CityStateArrays.ini is an editable ASCII text file that contains State-code cross-references,
city-prefix, and suffix rules (with words like "saint", "grand", "new", "gardens", "springs", etc.), as well as common two and three-word city names that do not fall into the prefix rules.  These guide the parsing steps.

Usually, this file is not edited and does not need to be changed.  But if you find a city that does not parse as expected (where part of the city name arrives in the Street Address), make adjustments here.  New state and city rules can be added and take affect when the program loads.  It is a fun file to review.

This file is fragile, with no auditing.  Edit with care.  It follows a fixed format; with case-sensitive entries, blank-line sensitive.  If the file is damaged, delete and re-launch -- a new default version is re-created.

The program was designed for US and Canadian addresses and probably is not useful in other countries.  This has been tested with tens of thousands of addresses and it is remarkably accurate.  Canadian addresses have been lightly tested.  

Cardinal Comments N S E W:
Considerable effort was spent on address problems with North, South, East, West.  For example, should these cardinals land on the street address or the city name? 
123 Elm East Newark, NJ  (there is a city called East Newark)
4100 E Highway South Ogden, UT  (there is a city called South Ogden)
354 East Hampton North NY (unbelievably not North NY)

The program attempts to solve this by having an actual inventory of all US cities with a cardinal in their name.  If the address has an identifiable street address (such as Suite 103F, Apt 100, the word "Street", etc.), it uses it to help de mark where the city name begins.  User-typed punctuation (commas) are too unreliable and are generally ignored.


Your feedback welcome. 
If you make changes to the .ini file, I would like to know what address you were trying to fix and what changes you made.


Version History:
1.01  Initial Release as a DLL in 2015.  This has been retired and is no longer available.

2.00  2024.01
First Windows version (non DLL); consumes transaction files

2.03  2024.02
Improved "city name" detection.  It was good before.  Now fantastic.



Related Articles:
At keyliner.com, see other blog entries and other keyliner-developed programs:

DeviceID - free asset Tag Management for home and small businesses
DirectoryPulse - a free and nifty backup program

This program was written using techniques from my programming books, "War and Peace Programming C#"  (search Amazon).   War and Peace Programming Volume 6 (Visual Studio C#).     


No comments:

Post a Comment

Comments are moderated and published upon review. (As an aside, not a single spam has been allowed through; why bother?)