NYCPHP Meetup

NYPHP.org

[nycphp-talk] (ir) regular expressions (stupid me)

Dan Cech dcech at phpwerx.net
Sat Apr 24 10:49:22 EDT 2004


Jayesh Sheth wrote:
> So what I am saying, is that I need to check for two commas, an alpha 
> numeric string before the first comma, a capitalized city name after the 
> first comma, and two capital letters after the second comma (for the 
> state). I will ignore whitespace before or after the commas. (That 
> whitespace can be trimmed programmatically).

You could make your life easier by trimming the whitespace before you 
start to validate the address.

> I find POSIX style expressions using the ereg() function to be a bit 
> easier to learn than their Perl equivalents. Here is what I came up with 
> using an ereg() expression:

Learning preg_* syntax is well worth the trouble because it is an order 
of magnitude more efficient than ereg.

> ^([[:alnum:]]+[\.]{0,}[[:space:]]{0,}){1,},([[:space:]]{0,}[[:upper:]][[:alpha:]]+),([[:space:]]{0,}[[:upper:]]{2})$ 
> 

$address = '123 Elm St., Brooklyn, NY';
// remove any whitespace around commas
$address = preg_replace ('/\s*,\s*/',',',$address);
// check address
if (preg_match ('/^([0-9]+ [\w\s]+)[.]?,([A-Z][a-z]+),([A-Z]{2})$/', 
$address, $matches)) {
   $street = $matches[1];
   $city   = $matches[2];
   $state  = $matches[3];
   $address = $street . ', ' . $city . ', ' . $state;
}

That preg expression is similar to the one you were already using in 
intent, and should be extended to deal with city names like 'Salt Lake 
City' etc.

One way would be:

'/^([0-9]+ [\w\s]+)[.]?,([A-Za-z\s]+),([A-Z]{2})$/'

You may in fact want to go with a 3-input solution and grab the street, 
city and state separately, unless you can be guaranteed the user will 
put in the commas.

> B) In the second case, what I want to check for seems to be much 
> simpler, but I having no luck.

You can change the above preg expression by removing the [.]? which will 
disallow the use of the . character in the address, however as you can 
see in the above expression, the . character is outside the brackets for 
the address, so it is discarded if it exists anyway.

Otherwise, you would have to do a negative check for 'St.', ie look for 
St., and if not found then look for St without the full stop.

Hope this helps in some way,

Dan




More information about the talk mailing list