NYCPHP Meetup

NYPHP.org

[nycphp-talk] REGEXP Solution Needed

justin justin at justinhileman.info
Wed Sep 8 10:42:10 EDT 2010


That's not going to do it for you. This should work:

    [?&]size=((?!10)|\d|1[1-9]|[02-9]\d|\d{3,})(&|#|$)

The end chunks force this to match a param (i.e. they ensure that it
actually starts and ends with query string delimiters).

The "size=" part is obvious.

The mess in the middle says "any number except 10". It could prob'ly
be refined a bit, but this does the trick :)

-- justin



On Wed, Sep 8, 2010 at 10:27 AM,  <ps at blu-studio.com> wrote:
> I believe this is what I am looking for:
> ^http://www\\.example\\.com/events/events?.*size=[\d|\d\d^10].*
>
> If anyone can polish this more or if I am wrong, pls give a note. Thanks.
>
> -------- Original Message --------
> Subject: Re: [nycphp-talk] REGEXP Solution Needed
> From: <ps at blu-studio.com>
> Date: Wed, September 08, 2010 6:52 am
> To: "NYPHP Talk" <talk at lists.nyphp.org>
>
> This is a great technique, thanks, Scott.
> But, I'm putting this into the Do Not Crawl front end of a google search
> appliance and it has to be done with gnu regexp. So I've been working on it
> and I got something like this for starters:
> ^http://www\\.example\\.com/events/events\\?\.size=[\d|\d\d^10]\.
>
> Where with the above I am intending to match my domain, then the directory
> path events/events followed by a questin mark, then any characters leading
> up to size = any one or two digits but not 10 followed by any characters.
> That is where I need to be going.
> Peter
>
> -------- Original Message --------
> Subject: Re: [nycphp-talk] REGEXP Solution Needed
> From: Scott Mattocks <scott at crisscott.com>
> Date: Wed, September 08, 2010 6:19 am
> To: NYPHP Talk <talk at lists.nyphp.org>
>
> On 09/08/2010 08:30 AM, ps at blu-studio.com wrote:
>> Using GNU Regular Expressions I need to examine an URL like those below,
>> checking the size key and value, I need to capture and block all URLs
>> where 'size does not equal 10'. In other words "size=12", not
>> acceptable.
>
> Regular expressions are expensive and should only be used when
> absolutely necessary. If you are checking for a specific string, just
> check for it with str* functions. Here's how I would check for it:
>
> $key = 'size';
> $val = 10;
> $url = 'http://....';
>
> $last = strrpos($url, $key . '=');
> if ($last !== false && $last == strrpos($url, $key . '=' . $value))
> {
> echo 'Good';
> }
> else
> {
> echo 'Bad';
> }
>
> That block of code makes sure that 'size=' shows up in your URL and that
> the last occurrence of 'size=' is actually 'size=10'. The last
> occurrence is the value that will be passed to the server so that's
> probably the only one you care about. If you want to verify that there
> is only one occurrence use strpos(...) == strrpos(...) in addition to
> the checks above.
>
> --
> Scott Mattocks
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
> ________________________________
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>



-- 
justin
http://justinhileman.com



More information about the talk mailing list