NYCPHP Meetup

NYPHP.org

[nycphp-talk] REGEXP Solution Needed

justin justin at justinhileman.info
Wed Sep 8 13:05:43 EDT 2010


that doesn't do what you think it does. it will fail on

http://www.example.com/events/events?node=&start=0&size=0&sort=event
http://www.example.com/events/events?node=&start=0&size=1&sort=event
http://www.example.com/events/events?node=&start=0&size=11&sort=event
http://www.example.com/events/events?node=&start=0&size=100&sort=event

and any "size" value starting with either 0 or 1.

use this instead:

   [?&]size=((?!10)|\d|1[1-9]|[02-9]\d|\d{3,})(&|#|$)


-- justin


On Wed, Sep 8, 2010 at 11:45 AM,  <ps at blu-studio.com> wrote:
>
> As usual lots of great inoput, but here is what seems to work for me testing
> it against some actula URLs:
> ^http://www\\.example\\.com/events/events?.*size=[^10].*
>
> Just using size does not equal 10.
>
> -------- Original Message --------
> Subject: Re: [nycphp-talk] REGEXP Solution Needed
> From: John Campbell <jcampbell1 at gmail.com>
> Date: Wed, September 08, 2010 7:47 am
> To: NYPHP Talk <talk at lists.nyphp.org>
>
> On Wed, Sep 8, 2010 at 10:27 PM, <ps at blu-studio.com> wrote:
>> I believe this is what I am looking for:
>> ^http://www\\.example\\.com/events/events?.*size=[\d|\d\d^10].*
>
> Test that, but I am quite sure it doesn't do what you want. I think
> you need negative lookahead, which typically has syntax like
>
> size=(?!10)
>
> but that isn't quite right, because it will negate with size=100.
>
> so I think you need:
>
> (size=(?!10))|(size=\d{3,}))
>
> Regards,
> John Campbell
>
>> If anyone can polish this more or if I am wrong, pls give a note. Thanks.
>>
>> -------- Original Message --------
>> Subject: Re: [nycphp-talk] REGEXP Solution Needed
>> From: <ps at blu-studio.com>
>> Date: Wed, September 08, 2010 6:52 am
>> To: "NYPHP Talk" <talk at lists.nyphp.org>
>>
>> This is a great technique, thanks, Scott.
>> But, I'm putting this into the Do Not Crawl front end of a google search
>> appliance and it has to be done with gnu regexp. So I've been working on
>> it
>> and I got something like this for starters:
>> ^http://www\\.example\\.com/events/events\\?\.size=[\d|\d\d^10]\.
>>
>> Where with the above I am intending to match my domain, then the directory
>> path events/events followed by a questin mark, then any characters leading
>> up to size = any one or two digits but not 10 followed by any characters.
>> That is where I need to be going.
>> Peter
>>
>> -------- Original Message --------
>> Subject: Re: [nycphp-talk] REGEXP Solution Needed
>> From: Scott Mattocks <scott at crisscott.com>
>> Date: Wed, September 08, 2010 6:19 am
>> To: NYPHP Talk <talk at lists.nyphp.org>
>>
>> On 09/08/2010 08:30 AM, ps at blu-studio.com wrote:
>>> Using GNU Regular Expressions I need to examine an URL like those below,
>>> checking the size key and value, I need to capture and block all URLs
>>> where 'size does not equal 10'. In other words "size=12", not
>>> acceptable.
>>
>> Regular expressions are expensive and should only be used when
>> absolutely necessary. If you are checking for a specific string, just
>> check for it with str* functions. Here's how I would check for it:
>>
>> $key = 'size';
>> $val = 10;
>> $url = 'http://....';
>>
>> $last = strrpos($url, $key . '=');
>> if ($last !== false && $last == strrpos($url, $key . '=' . $value))
>> {
>> echo 'Good';
>> }
>> else
>> {
>> echo 'Bad';
>> }
>>
>> That block of code makes sure that 'size=' shows up in your URL and that
>> the last occurrence of 'size=' is actually 'size=10'. The last
>> occurrence is the value that will be passed to the server so that's
>> probably the only one you care about. If you want to verify that there
>> is only one occurrence use strpos(...) == strrpos(...) in addition to
>> the checks above.
>>
>> --
>> Scott Mattocks
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>>
>> ________________________________
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>>
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>



-- 
justin
http://justinhileman.com



More information about the talk mailing list