NYCPHP Meetup

NYPHP.org

[nycphp-talk] PHP + UTF-8 + mb_string issue.

Anirudh Zala arzala at gmail.com
Wed Mar 21 02:39:45 EDT 2007


Subject: Re: [nycphp-talk] PHP + UTF-8 + mb_string issue.
Date: Wednesday 21 March 2007 11:48
From: Anirudh Zala <arzala at gmail.com>
To: Michael B Allen <mba2000 at ioplex.com>

On Wednesday 21 March 2007 11:36, you wrote:
> On Wed, 21 Mar 2007 10:50:26 +0530
>
> Anirudh Zala <arzala at gmail.com> wrote:
> > Hello Everybody,
> >
> > While building a truly multilingual project, I am running into an
> > interesting problem with php5 + utf-8 + mb_string.
>
> <snip>
>
> > ____________  = 1 word; 4 bytes; 2 characters (______, ______); 4
> > key-strokes (___, ___, ___, ___); "strlen" should be 2 but is 4.
>
> Generally the libc-like functions exhibit libc behavior so 4 is the
> correct answer.
>
> Is mb_strlen not suitable for some reason? You have to use mb_* functions
> whenever you perform character-wise operations as opposed to byte-wise
> (and that assumes you're running in the UTF-8 locale).
>
> Mike

I am using mb_* functions and UTF-8 as locale. Everything is transparently
processed in UTF-8 format only. I have tested same thing using "iconv"
extension but same results. Looks like it is the behavior of php + mb_*.

Thanks

Anirudh Zala

(30% of Internet resources,
used to deliver web-pages,
are wasted by unnecessary
tabs and spaces.)



More information about the talk mailing list