NYCPHP Meetup

NYPHP.org

[nycphp-talk] iterating through a multibyte string

Rob Marscher rmarscher at beaffinitive.com
Wed Jan 13 11:37:57 EST 2010


OK.  Here are the results of my rough benchmark.  Every time I ran it, the results were within about .025 seconds of each other so it seems accurate.  Surprisingly, my original mb_substr method won, with preg_split taking just a little bit longer.  John's method of grabbing the first character and then removing it from the string actually seems take almost exponentially more time based on how long the string is.  I set $strSize to 1000 and had to kill it because I didn't want to wait so long.  There must be something pretty inefficient going on in mb_substr to make that the case.  I suppose we could look at the source to get to the bottom of it... but I think I've already spent as much time on this as I'm willing to.  Thanks again to you guys.

$ php mbtest.php
normal iteration took 0.8041729927063
mb_substr method took 1.7228858470917
mb_substr method with shortening the string took 7.9840841293335
preg_split method took 2.1547298431396

$ cat mbtest.php 
<?php

$strSize = 100;
$repeats = 1000;

// make the string somewhat large
$str = '';
for ($i = 0; $i < $strSize; $i++) {
	$str .= "string with utf-8 chars\n   åèö";
}

// non-multibyte iteration
$start = microtime(true);
for ($i = 0; $i < $repeats; $i++) {
	$length = strlen($str);
	$newStr = '';
	for ($j = 0; $j < $length; $j++) {
		$newStr .= $str{$j};
	}
}
$end = microtime(true);
echo "normal iteration took " . ($end - $start) . "\n";

// mb_substr method
$start = microtime(true);
for ($i = 0; $i < $repeats; $i++) {
	$length = mb_strlen($str);
	$newStr = '';
	$rest = $str;
	for ($j = 0; $j < $length; $j++) {
		$newStr .= mb_substr($rest, $j, 1);
	}
}
$end = microtime(true);
echo "mb_substr method took " . ($end - $start) . "\n";

// mb_substr method, shortening string
$start = microtime(true);
for ($i = 0; $i < $repeats; $i++) {
	$length = mb_strlen($str);
	$newStr = '';
	$rest = $str;
	while ($rest) {
		$newStr .= mb_substr($rest, 0, 1);
		$rest = mb_substr($rest, 1);
	}
}
$end = microtime(true);
echo "mb_substr method with shortening the string took " . ($end - $start) . "\n";

// preg_split method
$start = microtime(true);
for ($i = 0; $i < $repeats; $i++) {
	$chars = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
	$length = count($chars);
	$newStr = '';
	for ($j = 0; $j < $length; $j++) {
		$newStr += $chars[$j];
	}
}
$end = microtime(true);
echo "preg_split method took " . ($end - $start) . "\n";






More information about the talk mailing list