NYCPHP Meetup

NYPHP.org

[nycphp-talk] Character set issues revisited

Michael B Allen ioplex at gmail.com
Tue Oct 23 11:12:21 EDT 2007


On 10/23/07, tedd <tedd at sperling.com> wrote:
> At 7:21 PM -0400 10/21/07, John Campbell wrote:
> >The first thing to understand about character encoding is the overlap
> >between UTF-8 and 8859-1.  Below is a sample
> >a - lower case a (Same in 8859-1 & UTF-8)
> >à - a acute (Available in 8859-1 & UTF8 but different values..)
> >éí - Chinese character (Not in 8859-1, in UTF-8)
>
> A small clarification -- it's not really overlap,
> but rather UTF-8 is a super-set containing 8859-1
> like both contain ASCII.

Well if you want to be pedantic about it, "overlap" is more accurate.
UTF-8 is a multibyte encoding of the Unicode charset. ISO-8859-1 is a
single byte encoding of the ISO-8859-1 charset. So yes, Unicode is a
superset of ISO-8859-1 but the UTF-8 encoding of values above 0x7f are
not the same.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



More information about the talk mailing list