NYCPHP Meetup

NYPHP.org

[nycphp-talk] Character set issues revisited

Cliff Hirsch cliff at pinestream.com
Sat Oct 20 14:46:49 EDT 2007


John, Michael:

Thanks for all the info.

> And by '?' do you mean '?' or '�' ... there is a big difference.
What is the difference? I have seen both?

If anyone else is struggling with this, John and Michael have great points.
There are some great articles on Wikepedia and Sitepoint regarding this
issue. Essentially, you need to look at the entire "publishing" chain --
from text editors, to database to php to Apache to the browser....

In my case, this narrowed down to a few problem areas.

1. I conveniently "grabbed" some special characters from a windows editor.
When, where? Who knows. But it's obvious. Lord knows I couldn't figure out
how to get that special squiggle thing any other way.

2. ISO-8859 and Windows-1252 char set are very, very similar. Naturally,
it's all those lovely special characters that are the difference. From
Wikepedia: "It is very common to mislabel text data with the charset label
ISO-8859-1, even though the data is really Windows-1252 encoded."

3. My pages were encoded as utf-8 -- the Apache default. Sure, I set the
meta tag to 8859. Not enough -- use:
 header('Content-type: text/html; charset=ISO-8859-1');

4. I still have some of those special characters, and I know they are not in
the 8859 char set, yet things are look'n fine. Puzzling, eh. Why? From
Wikepedia:

"Many web browsers and e-mail clients will interpret ISO-8859-1 control
codes as Windows-1252 characters in order to accommodate such mislabeling
but it is not a standard behaviour and care should be taken to avoid
generating these characters in ISO-8859-1 labeled content."

5. And this is big!!!
I use the Firefox html validator plug-in to validate my pages. It uses Tidy.
My pages came up as perfect -- till this problem hit when I switched
platforms. Yet, when I ran them through the W3c validator, it barfed all
over them.

Use http://validator.w3.org to validate your pages.

Cliff





More information about the talk mailing list