NYCPHP Meetup

NYPHP.org

[nycphp-talk] Sneaking in unwanted characters

Adam Maccabee Trachtenberg adam at trachtenberg.com
Wed Sep 10 19:26:46 EDT 2003


On Wed, 10 Sep 2003, Chris Shiflett wrote:

> Most of the RFCs either have example regular expressions or a very specific
> grammar that can be used to build one. I've seen that one in the back of the
> O'Reilly book, and my instinct tells me it shouldn't have to be that
> complicated. :-)

The full valid e-mail spec is really nasty, cause you can have
comments inside the address and other weird things. Here is the regex
from PHP Cookbook that allows most real-world addresses, but not
everything that's okay:

/
    ^               # anchor at the beginning
    [^@\s]+         # name is all characters except @ and whitespace
    @               # the @ divides name and domain
    (
        [-a-z0-9]+  # (sub)domains are letters, numbers, and hyphens
        \.          # separated by a period
    )+              # and we can have one or more of them
    (
        [a-z]{2}    # TLDs can be a two-letter alphabetical country
    code
        |com|net    # or one of 
        |edu|org    # many 
        |gov|mil    # possible
        |int|biz    # three-letter
        |pro        # combinations
        |info|arpa  # or even
        |aero|coop  # a few 
        |name       # four-letter ones
        |museum     # plus one that's six-letters long!
    )
    $               # anchor at the end
/ix                 # and everything is case-insensitive


Alternatively, check out imap_rfc822_parse_adrlist().

-adam

-- 
adam at trachtenberg.com
author of o'reilly's php cookbook
avoid the holiday rush, buy your copy today!




More information about the talk mailing list