NYCPHP Meetup

NYPHP.org

[nycphp-talk] A good PCRE expression for matching URLs

Michael B Allen ioplex at gmail.com
Thu Jul 24 16:32:18 EDT 2008


On Thu, Jul 24, 2008 at 2:37 PM, John Campbell <jcampbell1 at gmail.com> wrote:
> On Thu, Jul 24, 2008 at 2:19 PM, Michael B Allen <ioplex at gmail.com> wrote:
>> Does anyone have a good PCRE for matching URLs?
>>
>> Or perhaps someone can improve (or correct) the expression I'm using currently:
>>
>>  $expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}/~._-]*|mailto:[a-zA-Z0-9\\@.-]+';
>>
>
> I am not sure I completely understand what you are trying to do, but
> it doesn't look like you are matching + or %.

You mean in the path? In the path I suppose I should permit quite a
few more characters (I forgot 0-9 too). This makes the expression:

$expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}0-9!$%&\\()+-./;=^_~]*|mailto:[a-zA-Z0-9\\@.-]+',

> What is the context for the matching?

This will be used to pick out URLs in Creole Wiki markup. Which
incedentally is not supposed to match characters that can occur
naturally at the end of a sentence (,.?!:;"') so I guess I need to
leave out '.' and ';' for my particular application.

So given markup:

  Please visit http://www.yahoo.com/usèrs+100%&lusers$/~jerry/y_a-n.g/Yahoo;=^(!)foo.

The regex should match (minus the dot at the end):

  [http://www.yahoo.com/usèrs+100%&lusers$/~jerry/y_a-n.g/Yahoo;=^(!)foo]

although in practice a URL this crazy should probably be formalized
with square brackets as defined by Creole for links.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



More information about the talk mailing list