[nycphp-talk] Charsets are still driving me nuts
chsnyder at gmail.com
Thu Mar 6 17:15:40 EST 2008
I just found another potential gotcha when using Unicode throughout
your application: byte order marks in uploaded text files.
Turns out Word puts a byte order mark (BOM) at the beginning of all
Unicode files. Unicode-friendly tools ignore it. PHP's fgets()
Detecting and stripping the BOM is an interesting exercise, because
strlen('ï»¿') == 6, but it's really only 3 bytes long... not sure if
this is a bug or what, but it's certainly an annoyance.
More information about the talk