[nycphp-talk] iterating through a multibyte string

Michael B Allen ioplex at
Thu Jan 14 12:03:56 EST 2010

On Wed, Jan 13, 2010 at 2:24 PM, Chris Snyder <chsnyder at> wrote:
> On Wed, Jan 13, 2010 at 1:54 PM, Dan Cech <dcech at> wrote:
>> but still can't beat preg_split, most likely because of the overhead
>> involved in overwriting $rest on every pass through the loop.
> Regex wins a speed test. Nice thread!

The more work done in C the faster it will be. So even though there is
the overhead of compiling the regex expression, it frequently turns
out to be faster. This is way I always use regex for tokenizers when
parsing complex data (I did this for a Creole Wiki markup

Note that you can also convert the string to something like UCS-2BE
using iconv and then each character will occupy exactly 2 bytes in the
resulting string allowing you to iterate over it in the mostly
convential way. This might turn out to be faster but I'm not sure.


Michael B Allen
PHP Active Directory Integration

More information about the talk mailing list