NYCPHP Meetup

NYPHP.org

[nycphp-talk] enforcing Latin-1 input

Allen Shaw ashaw at polymerdb.org
Wed Nov 23 14:40:53 EST 2005


Mikko Rantalainen wrote:
> Allen Shaw wrote:
>> [snip] what you think of this half-baked idea: [snip...]
 >
> You cannot trust that behavior. Specification only says (IIRC) that 
> the user agent MUST not send characters outside iso-8859-1 on such a 
> form. 

Okay, I'm in way over my head here.  I'd like to get my hands on that 
spec -- would you have a link or some reasonably unique keywords to 
google for (w3c, character encoding, specification, etc. don't seem to 
be cutting it...)?  I should just dig in there and understand what I'm 
doing before trying to implement anything, I think.

> I guess that what I'm trying to tell you is that to *force* 
> iso-8859-1 input only, you're going to have to use UTF-8 for the 
> form and you'ge going to have to use UTF-8 internally. That's the 
> only way you can really get in iso-8859-1 encoding the same data the 
> user really tried to input.

What I'm really trying to do is not encode their input into Latin-1, but 
figure out if they _entered_ Latin-1 characters in the form and if so 
accept it, or if not, reject it and tell them why.  If we just encode 
their Chinese characters into latin-1 neither I nor anybody around me 
will be able to read it, not in any encoding or character set, because 
of human language limitations; so I want to require the user to enter 
either common western characters only or nothing at all.  Anyway, maybe 
it's a fool's errand...

Thanks for bouncing this around with me.  If I do go with any particular 
approach I'll let you know as an update.

- Allen

-- 
Allen Shaw
Polymer (http://polymerdb.org)



More information about the talk mailing list