NYCPHP Meetup

NYPHP.org

[nycphp-talk] iterating through a multibyte string

Michael B Allen ioplex at gmail.com
Thu Jan 14 12:03:56 EST 2010


On Wed, Jan 13, 2010 at 2:24 PM, Chris Snyder <chsnyder at gmail.com> wrote:
> On Wed, Jan 13, 2010 at 1:54 PM, Dan Cech <dcech at phpwerx.net> wrote:
>
>> but still can't beat preg_split, most likely because of the overhead
>> involved in overwriting $rest on every pass through the loop.
>
> Regex wins a speed test. Nice thread!

The more work done in C the faster it will be. So even though there is
the overhead of compiling the regex expression, it frequently turns
out to be faster. This is way I always use regex for tokenizers when
parsing complex data (I did this for a Creole Wiki markup
interpreter).

Note that you can also convert the string to something like UCS-2BE
using iconv and then each character will occupy exactly 2 bytes in the
resulting string allowing you to iterate over it in the mostly
convential way. This might turn out to be faster but I'm not sure.

Mike

-- 
Michael B Allen
PHP Active Directory Integration
http://www.ioplex.com/plexcel.html



More information about the talk mailing list