[nycphp-talk] User Input Data scrubbing

Michele Waldman mmwaldman at
Fri Nov 28 17:12:21 EST 2008

I could have sworn I sent this out already.


My question, my most important one of the day.


If I pass a string from html/javascript to php, does it not assume that the
string is the default character set, containing only 255 characters?


Or, do I have to cast that string to a specific character set.


I want to force my strings to have only 255 characters.  That way, I know
what I'm dealing with.  A finite set of characters.





From: talk-bounces at [mailto:talk-bounces at] On
Behalf Of Elijah Insua
Sent: Friday, November 28, 2008 3:27 PM
To: NYPHP Talk
Subject: Re: [nycphp-talk] User Input Data scrubbing



SQL injection and Html injection are two separate issues.  

SQL injection is something like a user posting ';DELETE FROM users;  where
it deletes all of your user accounts.

Html/Cross Site Scripting is more along the lines of what you are talking
about.  There are tons of libraries out there
that attempt to kill off as many of these as possible.

As far as your 255 character theory, it is not completely true.  There are
other character sets such as UTF-8 which allow
for 65 thousand characters.  I would seriously invest some time into finding
a library that you can integrate.

- Elijah

On Fri, Nov 28, 2008 at 3:04 PM, Michele Waldman <mmwaldman at>

Could ya'll repost any responses to this.  Apparently, my new email address
wasn't subscribed to the mailing list.



From: Michele Waldman [mailto:mmwaldman at] 
Sent: Friday, November 28, 2008 2:06 PM
To: 'NYPHP Talk'
Subject: User Input Data scrubbing


I'm trying to scrub data input to insert into a database which I will later
display on the website.


In order to prevent sql injections and html injections into the code, I
figured I'd just replace non alphanumeric characters with their html special
character codes and remove any control characters all together except
carriage return.


The ascii character codes only go up to 255.


However, there are lots more characters in html.


If the user creates a string from which was generated using html using
characters outside of the ascii character codes, what do those get
translated to in the string?  A garage character?


Is that a concern?  Or is my only concern those 255 characters in the ascii
chart?  I'm thinking the 255 characters covers it all.  The characters are a
finite set which were long ago predefined, unless that changes in the
future, right?  This means scrubbing the data is a short function.


I'm not using mysql_real_escape_string, because I replace all ' and " with
their html character code.


I'm not using htmlspecialchars, because it wasn't thorough enough.  I simply
wrote a function that replaces just about every character with it's html
character code.


I'm doing this in php after the data is passed to me.


Now, in the case of ajax, I just need to come up with a good approach for
checking the data received from php, which may vary depending on the type of
ajax used.



New York PHP User Group Community Talk Mailing List


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the talk mailing list