NYCPHP Meetup

NYPHP.org

[nycphp-talk] A good PCRE expression for matching URLs

Michael B Allen ioplex at gmail.com
Thu Jul 24 19:50:13 EDT 2008


On Thu, Jul 24, 2008 at 5:34 PM, John Campbell <jcampbell1 at gmail.com> wrote:
> On Thu, Jul 24, 2008 at 4:32 PM, Michael B Allen <ioplex at gmail.com> wrote:
>> On Thu, Jul 24, 2008 at 2:37 PM, John Campbell <jcampbell1 at gmail.com> wrote:
>>> On Thu, Jul 24, 2008 at 2:19 PM, Michael B Allen <ioplex at gmail.com> wrote:
>>> What is the context for the matching?
>>
>> This will be used to pick out URLs in Creole Wiki markup. Which
>> incedentally is not supposed to match characters that can occur
>> naturally at the end of a sentence (,.?!:;"') so I guess I need to
>> leave out '.' and ';' for my particular application.
>>
>
> Many urls contain a question mark.  Why not just accept anything
> except a period or an question mark at the end?
>  (http://|ftp://|mailto:).*?[\.\?]?\s

Despite the fact that things should be escaped when output, I think
it's a good opportunity to effectively validate things.

But it would be nice to exclude those end-of-sentence punctuation from
the capture output. I tried the following minimalistic expression just
to try and get the trailing condition right I'm not able to
distinguish between a dot that is part of the URL and a period at the
end.

  $expr = '(http://[a-z./]+)\\. ';

Your expression doesn't seem to work for me either. It seems that '.*'
is just matching everything.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



More information about the talk mailing list