NYCPHP Meetup

NYPHP.org

[nycphp-talk] htmlentities charset bug

Cliff Hirsch cliff at pinestream.com
Wed Jan 23 13:16:48 EST 2008


On 1/23/08 12:58 PM, "Michael B Allen" <ioplex at gmail.com> wrote:>>  Reason:
>>Invalid multibyte sequence in argument
>> Those curly single and double quotes are killers.
> 
> The problem isn't htmlentities, it's the charset you're pages are
> emitted in. If you emit an HTML form in ISO-8859-1 and then submit
> garbage data, the database may store it as garbage and now you have a
> simple garbage-in / garbage-out scenario. Feed that to htmlentites and
> tell it it's ISO-8859-1 and you'll get an "Invalid multibyte sequence"
> error.

> if the browser was really sophisticated about it
> it could pop-up a dialog that warns you and asks you if you would like
> to transliterate those characters to ISO-8859-1 equivalent glyphs.
I wonder if there is any way to detect this on the server side. Htmlentities
certainly catches the problem, but returns an empty string. Some sort of
friendlier filter that strips characters that are the wrong charset would be
very cool.

> I always use UTF-8.
I think I will too! Seems to be the way to go.






More information about the talk mailing list