NYCPHP Meetup

NYPHP.org

[nycphp-talk] A good PCRE expression for matching URLs

Michael B Allen ioplex at gmail.com
Thu Jul 24 14:19:58 EDT 2008


Does anyone have a good PCRE for matching URLs?

All of the examples that I have looked at in various places are too
simple or exclude invalid characters rather than include valid ones
(and of course fail to exclude all bad characters) or don't properly
use escaping ... etc.

Or perhaps someone can improve (or correct) the expression I'm using currently:

  $expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}/~._-]*|mailto:[a-zA-Z0-9\\@.-]+';

The exprssion breakdown is:

  [a-zA-Z0-9]{1,10} - Protocol specifier (e.g. http, ftps, smb, gopher, ...)
  :// - Protocol host separator (mailto style handled by or condition)
  [a-zA-Z0-9.-]+ - The hostname (currently we assume only ASCII)
  [\p{L}/~._-]* - A UTF-8 path (probably need to allow some other
chars but not '?')
  |mailto:[a-zA-Z0-9\@.-]+ - Or a mailto URL

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



More information about the talk mailing list