NYCPHP Meetup

NYPHP.org

[nycphp-talk] iterating through a multibyte string

Ben Sgro ben at projectskyline.com
Thu Jan 14 16:58:11 EST 2010


Mike,

I've seen the exact opposite argument, that the family of functions 
str_* are faster than regex.
And, um, PHP is implemented in C, so isn't all the work done in C at the 
end of the day?

The str_* methods will be optimized, the (dynamic) regex will not.

I'm confused by your logic ...

- Ben

Michael B Allen wrote:
> On Wed, Jan 13, 2010 at 2:24 PM, Chris Snyder <chsnyder at gmail.com> wrote:
>   
>> On Wed, Jan 13, 2010 at 1:54 PM, Dan Cech <dcech at phpwerx.net> wrote:
>>
>>     
>>> but still can't beat preg_split, most likely because of the overhead
>>> involved in overwriting $rest on every pass through the loop.
>>>       
>> Regex wins a speed test. Nice thread!
>>     
>
> The more work done in C the faster it will be. So even though there is
> the overhead of compiling the regex expression, it frequently turns
> out to be faster. This is way I always use regex for tokenizers when
> parsing complex data (I did this for a Creole Wiki markup
> interpreter).
>
> Note that you can also convert the string to something like UCS-2BE
> using iconv and then each character will occupy exactly 2 bytes in the
> resulting string allowing you to iterate over it in the mostly
> convential way. This might turn out to be faster but I'm not sure.
>
> Mike
>
>   



More information about the talk mailing list