NYCPHP Meetup

NYPHP.org

[nycphp-talk] Re: regexp for URLs (is this correct?)

Jayesh Sheth jayeshsh at ceruleansky.com
Mon May 3 22:08:27 EDT 2004


Hello all,

thanks for the excellent pointers regarding URL validation.
I think that since I am only validating http:// and https:// URLs for 
now, the really (30 - 50 line) long one would be too much to incorporate 
for this job ...

But I just realized some things that were wrong with my previous regular 
expressions (those matching 'http://www.google.com' and 'www.google.com' 
respectively):

a) I could check for a optional slash at the end by using something like:
/?

b) In both cases, input such as the following would fail:
http://www.google.com/something/something.html

OR
www.google.com/something/something.html

With a bit of experimenting (I really need to upload my interactive 
Perl-regex tester script to my public scripts area), I came up with the 
following:

/*
Should match:
http://www.google.com/something/something.html
http://www.google.com/something/something
http://www.google.com/something/something/
http://www.google.com/
http://www.google.com
*/
#^(([a-z]{3,5})://)(([0-9a-z-]+\.)+[0-9a-z]{2,6})((/[0-9a-z-]*)+?/?([0-9a-z-.]*)+?)$#i

and
/*
Should match:
www.google.com/something/something.html
www.google.com/something/something
www.google.com/something/something/
www.google.com/
www.google.com
*/

#^(([0-9a-z-]+\.)+[0-9a-z]{2,6})((/[0-9a-z-]*)+?/?([0-9a-z-.]*)+?)$#i

It is getting close to my bed time now, so I am not sure how correct 
these are. I will do some more testing tomorrow. If, however, they do 
work, they might be of use to others.

If anyone finds anything wrong with them, please let me know.

Best Regards,
- Jay

PS: I changed my regexps to allow 6 letter domains ( .museum ) after 
reading some responses today. The [0-9a-z]{2,6} part, that is.
PPS: I will be using these expressions to mostly evaluate newly 
submitted URLs via a text input box. The other regexps that I posted 
recently were for batch validation (and transformation / import) of lots 
of invalid MySQL data. Thanks for the fopen() tip, David.




More information about the talk mailing list