NYCPHP Meetup

NYPHP.org

[nycphp-talk] iterating through a multibyte string

Dan Cech dcech at phpwerx.net
Wed Jan 13 13:54:33 EST 2010


Rob Marscher wrote:
> On Jan 13, 2010, at 12:44 PM, John Campbell wrote:
>> You forgot
>> mb_internal_encoding("UTF-8");
>>
>> without that, mb_substr is just an alias for substr
> 
> Thanks, John.  I thought I had that set in my php.ini - but I must have overwritten my php.ini with a new install since then.

Good catch!  I missed that too...

>> my results look like:
>>
>> normal iteration took 0.64724087715149
>> mb_substr method took 16.471849918365
>> mb_substr method with shortening the string took 21.613878965378
>> preg_split method took 1.927277803421
>>
>> Dan is the winner.  preg_split always runs in linear time.  Both of
>> the mb_substr are O(N^2), because the first step in mb_substr is
>> splitting the string into array.  It is not as intelligent as I
>> initially assumed.
> 
> Thanks for the analysis!  I got similar results on the new run too.

I worked up a quick alternative that avoided mb_substr for calculating
$rest:

for ($i = 0; $i < $repeats; $i++) {
	$length = mb_strlen($str);
	$newStr = '';
	$rest = $str;
	while ($rest) {
		$c = mb_substr($rest, 0, 1);
		$newStr .= $c;
		$rest = substr($rest,strlen($c));
	}
}

as long as you don't have mbstring.func_overload enabled it is much more
efficient than shortening the string using mb_substr:

normal iteration took 0.95997190475464
mb_substr method took 19.002305984497
mb_substr method with shortening the string took 25.623261928558
mb_substr method with shortening the string using substr took
6.5963559150696
preg_split method took 2.5313749313354

but still can't beat preg_split, most likely because of the overhead
involved in overwriting $rest on every pass through the loop.

Dan



More information about the talk mailing list