In this article we present our best practice for preparing data submitted from a form for use in an SQL query and preparing that data for display in a browser.
Storing data in a database requires escaping certain characters that could either be interpreted by the database engine as a command or that might generate an error. For example, say a user has entered into a form the following data:
My Dear Aunt Sally
Your PHP code might be something like:
$SQL = "INSERT INTO MyTable (MyColumn) VALUES ('{$_POST['MyData']}')";
where $_POST['MyData'] holds the phrase My Dear Aunt Sally.
The resulting SQL statement will look like this:
INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally')
Notice how the value is delimited by single quotes, thereby marking the beginning and ending of the value to be inserted into the table. But what happens if the value you want to insert is the following?
My Dear Aunt Sally's Picnic Basket
The resulting SQL statement will look like this:
INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally's Picnic Basket')
Notice that we now have three single quotes. The database engine will see the first single quote, the value My Dear Aunt Sally and will then see the second single quote. It will assume that the phrase s Picnic Basket') is part of -- what? The database engine doesn't know. It will then generate an error. Consequently, we need to escape the single quote that appears in the middle of the value. (See Note 1 below)
PHP provides several different functions for handling this situation which is referred to as escaping. It is a method for marking certain characters so that they are not interpreted as part of a command.
Best Practice: Do not use the function addslashes but, instead, use a function that is specific to your database to escape special characters. (See Note 2 below)
Here is an example using mysql_escape_string:
Assuming that $_POST['MyData'] is My Dear Aunt Sally's Picnic Basket and our code is the following:
$data = mysql_escape_string($_POST['MyData']); $SQL = "INSERT INTO MyTable (MyColumn) VALUES ('$data')";
The resulting SQL statement will look like this:
INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally\'s Picnic Basket')
which is the desired result.
What are magic_quotes and what's wrong with magic_quotes?
Best Practice: Turn off magic quotes! There are three php.ini settings that relate to magic_quotes:
By turning off all three "magic_quote" settings, you eliminate:
There are two ways to turn off these settings:
function fix_magic_quotes ($var = NULL, $sybase = NULL) { // if sybase style quoting isn't specified, use ini setting if ( !isset ($sybase) ) { $sybase = ini_get ('magic_quotes_sybase'); } // if no var is specified, fix all affected superglobals if ( !isset ($var) ) { // if magic quotes is enabled if ( get_magic_quotes_gpc () ) { // workaround because magic_quotes does not change $_SERVER['argv'] $argv = isset($_SERVER['argv']) ? $_SERVER['argv'] : NULL; // fix all affected arrays foreach ( array ('_ENV', '_REQUEST', '_GET', '_POST', '_COOKIE', '_SERVER') as $var ) { $GLOBALS[$var] = fix_magic_quotes ($GLOBALS[$var], $sybase); } $_SERVER['argv'] = $argv; // turn off magic quotes, this is so scripts which // are sensitive to the setting will work correctly ini_set ('magic_quotes_gpc', 0); } // disable magic_quotes_sybase if ( $sybase ) { ini_set ('magic_quotes_sybase', 0); } // disable magic_quotes_runtime set_magic_quotes_runtime (0); return TRUE; } // if var is an array, fix each element if ( is_array ($var) ) { foreach ( $var as $key => $val ) { $var[$key] = fix_magic_quotes ($val, $sybase); } return $var; } // if var is a string, strip slashes if ( is_string ($var) ) { return $sybase ? str_replace ('\'\'', '\'', $var) : stripslashes ($var); } // otherwise ignore return $var; }
The code above will strip the slashes from the superglobal arrays. It will then set all three ini settings to 0 (off), which will effectively simulate the server having all three settings turned off in the first place, and prevent your data from being mangled. However, since the ini_set function does not permanently change the setting, it is necessary to call the function at the beginning of each script.
See: http://php.net/ini_set.
Once you have retrieved data from a database, you will want to display it. But in the same way that the data we were inserting into the database may contain characters which 'break' our SQL, there may be characters which will 'break' our output, so we need to 'escape' (also called 'converting' or 'encoding') our data for output.
The most common use of PHP is to display the data in an HTML page. However there are many characters which will wreak havoc with HTML (called HTML 'entities' because they have special meaning in HTML). Htmlentities will convert these characters to their html-safe equivalents. The most notorious of these are the greater than (>) and less than (<) characters.
Best Practice: Always use htmlentities for displaying data in a browser. Do not use htmlentities for storing data. (See Note 3 below)
For more information on htmlentities see http://php.net/htmlentities.
This function works in the same way as htmlentities but only converts a subset of the HTML entities (single quotes, double quotes, ampersands, greater than (>) and less than (<) characters). In some situations this can be the desired behavior, but most of the time you will want to use htmlentities.
For more information on htmlspecialchars see http://php.net/htmlspecialchars.
There are times when you will be retrieving data from a database which will not be displayed in a browser but will, instead, be used for other purposes. Therefore, you should not use htmlentities or htmlspecialchars when:
You will notice that two commands - addslashes and stripslashes - are conspicuously absent from this article. If you follow our best practice recommendations, you will not need to use either of them with the one exception of stripslashes which is used in the fix_magic_quotes function.
1- Handling embedded single quotes is not the only reason to escape certain characters. An equally important reason for escaping characters is to prevent SQL Injection, that is, the technique whereby a malicious user may try to insert SQL commands that may compromise the security of your web site. (The issue of SQL Injection will be examined in a future PHundamentals article. Though our emphasis has been on MySQL, it is an issue with all databases.)
2- The main reason for using mysql_real_escape_string instead of addslashes is that the former will handle many more characters that require special handling.(See: http://www.mysql.com/doc/en/String_syntax.html.) Addslashes will only escape " (double quote), ' (single quote) \ (backslash) and NUL (the null byte) with a backslash. Mysql_real_escape_string will take into account the character set of the current connection, and escape characters as needed. However, both mysql_escape_string and mysql_real_escape_string will additional characters that have a special meaning to MySQL, such as:
In addition, mysql_real_escape_string will encode any newlines into their encoded (\n or \r\n) counterpart.
All of the database-specific functions (e.g., pg_escape_string) include these kinds of special cases for the
particular database, whereas addslashes does not.
For simple data it will work, but there is the possibility you may end up in a situation
where addslashes alone will fail.
3- Best practice is to always store the data as "display-agnostic" thereby assuring you the maximum amount of flexibility while maintaining the data in its "raw" state. Therefore, if you need to display the data in a browser, you would use htmlentities. If the data is to be written to a CSV file, it can be retrieved as-is and then encoded accordingly with commas and quotes.
Contributors to this note include the following:
free PHP thumbnail maker CSS & Javascript minify gzip pipeline online API and console
Free API and developer tools console for PageSpeed optimization.