[nycphp-talk] Charsets are still driving me nuts

csnyder chsnyder at
Thu Mar 6 17:15:40 EST 2008

I just found another potential gotcha when using Unicode throughout
your application: byte order marks in uploaded text files.

Turns out Word puts a byte order mark (BOM) at the beginning of all
Unicode files. Unicode-friendly tools ignore it. PHP's fgets()

Detecting and stripping the BOM is an interesting exercise, because
strlen('') == 6, but it's really only 3 bytes long... not sure if
this is a bug or what, but it's certainly an annoyance.

Chris Snyder

