Parse Usable Street Address, City, State, Zip From A String
Problem: I have an address field from an Access database which has been converted to SQL Server 2005. This field has everything all in one field. I need to parse out the address's individual sections into their appropriate fields in a normalized table. I need to do this for approximately 4,000 records, and it needs to be repeatable.
Assume an address in the US (for now)
assume that the input string will sometimes contain an addressee (the person being addressed) and/or a second street address (i.e. Suite B)
states may be abbreviated
zip code could be standard 5 digits or zip+4
there are typos in some instances
UPDATE: In response to the questions posed, standards were not universally followed; I need need to store the individual values, not just geocode and errors means typo (corrected above)
A. P. Croll & Son 2299 Lewes-Georgetown Hwy, Georgetown, DE 19947
11522 Shawnee Road, Greenwood DE 19950
144 Kings Highway, S.W. Dover, DE 19901
Intergrated Const. Services 2 Penns Way Suite 405 New Castle, DE 19720
Humes Realty 33 Bridle Ridge Court, Lewes, DE 19958
Nichols Excavation 2742 Pulaski Hwy Newark, DE 19711
2284 Bryn Zion Road, Smyrna, DE 19904
VEI Dover Crossroads, LLC 1500 Serpentine Road, Suite 100 Baltimore MD 21
580 North Dupont Highway Dover, DE 19901
P.O. Box 778 Dover, DE 19903
I've done a lot of work on this kind of parsing. Because there are errors you won't get 100% accuracy, but there are a few things you can do to get most of the way there, and then do a visual BS test. Here's the general way to go about it. It's not code, because it's pretty academic to write it, there's no weirdness, just lots of string handling.
(Now that you've posted some sample data, I've made some minor changes)
- Work backward. Start from the zip code, which will be near the end, and in one of two known formats: XXXXX or XXXXX-XXXX. If this doesn't appear, you can assume you're in the city, state portion, below.
- The next thing, before the zip, is going to be the state, and it'll be either in a two-letter format, or as words. You know what these will be, too -- there's only 50 of them. Also, you could soundex the words to help compensate for spelling errors.
- before that is the city, and it's probably on the same line as the state. You could use a zip-code database to check the city and state based on the zip, or at least use it as a BS detector.
- The street address will generally be one or two lines. The second line will generally be the suite number if there is one, but it could also be a PO box.
- It's going to be near-impossible to detect a name on the first or second line, though if it's not prefixed with a number (or if it's prefixed with an "attn:" or "attention to:" it could give you a hint as to whether it's a name or an address line.
I hope this helps somewhat.
- → I can't convert Json to string [OctoberCms]
- → Passing a JS var from AJAX response to Twig
- → how to convert stdclass into string
- → Dynamic url segment name in laravel 5.1
- → Knockout JS - How to return empty strings for observable fields
- → How to replace *( in a string
- → Is this considered to be a custom facade approach in Laravel?
- → Regex extract string after the second "." dot character at the end of a string
- → Htaccess negation
- → Convert generic text string in json object into valid url using Angular