Functions for Storing Data Submitted From a Form and Displaying Data from a Database

NYPHP - PHundamentals

In this article we present our best practice for preparing data submitted from a form for use in an SQL query and preparing that data for display in a browser.

Storing Data Submitted From a Form

1) mysql_escape_string, mysql_real_escape_string & and other database-specific functions

Storing data in a database requires escaping certain characters that could either be interpreted by the database engine as a command or that might generate an error. For example, say a user has entered into a form the following data:

	My Dear Aunt Sally
	

Your PHP code might be something like:

	$SQL = "INSERT INTO MyTable (MyColumn) VALUES ('{$_POST['MyData']}')";
	

where $_POST['MyData'] holds the phrase My Dear Aunt Sally.
The resulting SQL statement will look like this:

	INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally')
	

Notice how the value is delimited by single quotes, thereby marking the beginning and ending of the value to be inserted into the table. But what happens if the value you want to insert is the following?

	My Dear Aunt Sally's Picnic Basket
	

The resulting SQL statement will look like this:

	INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally's Picnic Basket')
	

Notice that we now have three single quotes. The database engine will see the first single quote, the value My Dear Aunt Sally and will then see the second single quote. It will assume that the phrase s Picnic Basket') is part of -- what? The database engine doesn't know. It will then generate an error. Consequently, we need to escape the single quote that appears in the middle of the value. (See Note 1 below)

PHP provides several different functions for handling this situation which is referred to as escaping. It is a method for marking certain characters so that they are not interpreted as part of a command.

Best Practice: Do not use the function addslashes but, instead, use a function that is specific to your database to escape special characters. (See Note 2 below)

  • If you are using MySQL:
    Use mysql_escape_string if you are using a version of PHP prior to ver. 4.3.
    Use mysql_real_escape_string if you are using PHP ver. 4.3 or better.
    The former will escape more than just a single quote, double quote, backslash and NUL byte. The latter will take into account the character set of the connection that is in use.
    For more information see http://php.net/mysql_escape_string.
    For more information see http://php.net/mysql_real_escape_string.
  • If you are using PostgreSQL:
    Use pg_escape_string.
    For more information see http://php.net/pg_escape_string.
  • If you are using other databases:
    You might want to consider using something like ADOdb, specifically the qstr function or PEAR's quote function. These libraries will also work well with MySQL, PostgreSQL and other databases.
    For more information on ADOdb, see http://php.weblogs.com/ADOdb.
    For more information on PEAR, see http://pear.php.net/.

Here is an example using mysql_escape_string:

Assuming that $_POST['MyData'] is My Dear Aunt Sally's Picnic Basket and our code is the following:

	$data = mysql_escape_string($_POST['MyData']);
	$SQL = "INSERT INTO MyTable (MyColumn) VALUES ('$data')";
	

The resulting SQL statement will look like this:

	INSERT INTO MyTable (MyColumn) VALUES ('My Dear Aunt Sally\'s Picnic Basket')
	

which is the desired result.

2) magic_quotes

What are magic_quotes and what's wrong with magic_quotes?

  1. What are magic quotes?
    Magic quotes is a PHP configuration that, when turned on, alters data before it becomes available to your script by prepending backslashes to characters that require escaping, such as single quote ('), the double quote ("), the backslash ( \ ), and NUL (the NUL byte with character code zero). In other words, it automatically escapes characters for you.
  2. What's wrong with using magic quotes?
    The biggest problem with magic quotes is that it reduces the amount of control that the developer has over the data as it "automagically" manipulates your data without your being aware of it. For further information on the "evils" of magic quotes, see http://www.webmasterstop.com/tutorials/magic-quotes.shtml.

Best Practice: Turn off magic quotes! There are three php.ini settings that relate to magic_quotes:

  1. magic_quotes_gpc: When turned on, PHP will alter the contents of the $_ENV, $_REQUEST, $_GET, $_POST, $_COOKIE and $_SERVER superglobal variables automagically if they contain a '(single quote), "(double quote), \(backslash) and NUL's by adding a backslash before your script even gets to see them.
  2. magic_quotes_sybase: When turned on, the superglobals will instead be escaped 'sybase-style', which means that only single quotes are escaped, and this time with a second single quote instead of a backslash. So, while stripslashes could be used to reverse the effect of magic_quotes_gpc, if magic_quotes_sybase is enabled, you would need to instead replace every instance of '' (2 single quotes) with '(1 single quote).
  3. magic_quotes_runtime: When turned on, this setting manipulates data returned from a database, text file, etc. and, therefore, requires the use of the stripslashes function to "clean up" data coming from a database or text file.

By turning off all three "magic_quote" settings, you eliminate:

  • The possiblity of your data being stored with multiple backslashes.
  • The need to replace two single quotes with one single quote (in the case of magic_quotes_sybase).
  • The need to use the stripslashes function on data you retrieve from a form or database.

There are two ways to turn off these settings:

  1. If you have control over the php.ini file you can set these configuration options to "Off".
  2. If you do not have control over the php.ini file, such as in the case of a shared server environment, you can use the function provided below. You would need to include this function in all of your scripts and it would need to be one of the very first functions called by your script.
function fix_magic_quotes ($var = NULL, $sybase = NULL)
{
	// if sybase style quoting isn't specified, use ini setting
	if ( !isset ($sybase) )
	{
		$sybase = ini_get ('magic_quotes_sybase');
	}

	// if no var is specified, fix all affected superglobals
	if ( !isset ($var) )
	{
		// if magic quotes is enabled
		if ( get_magic_quotes_gpc () )
		{
			// workaround because magic_quotes does not change $_SERVER['argv']
			$argv = isset($_SERVER['argv']) ? $_SERVER['argv'] : NULL; 

			// fix all affected arrays
			foreach ( array ('_ENV', '_REQUEST', '_GET', '_POST', '_COOKIE', '_SERVER') as $var )
			{
				$GLOBALS[$var] = fix_magic_quotes ($GLOBALS[$var], $sybase);
			}

			$_SERVER['argv'] = $argv;

			// turn off magic quotes, this is so scripts which
			// are sensitive to the setting will work correctly
			ini_set ('magic_quotes_gpc', 0);
		}

		// disable magic_quotes_sybase
		if ( $sybase )
		{
			ini_set ('magic_quotes_sybase', 0);
		}

		// disable magic_quotes_runtime
		set_magic_quotes_runtime (0);
		return TRUE;
	}

	// if var is an array, fix each element
	if ( is_array ($var) )
	{
		foreach ( $var as $key => $val )
		{
			$var[$key] = fix_magic_quotes ($val, $sybase);
		}

		return $var;
	}

	// if var is a string, strip slashes
	if ( is_string ($var) )
	{
		return $sybase ? str_replace ('\'\'', '\'', $var) : stripslashes ($var);
	}

	// otherwise ignore
	return $var;
}

The code above will strip the slashes from the superglobal arrays. It will then set all three ini settings to 0 (off), which will effectively simulate the server having all three settings turned off in the first place, and prevent your data from being mangled. However, since the ini_set function does not permanently change the setting, it is necessary to call the function at the beginning of each script.

See: http://php.net/ini_set.

Retrieving Data For Display in a Browser

Once you have retrieved data from a database, you will want to display it. But in the same way that the data we were inserting into the database may contain characters which 'break' our SQL, there may be characters which will 'break' our output, so we need to 'escape' (also called 'converting' or 'encoding') our data for output.

1) htmlentities

The most common use of PHP is to display the data in an HTML page. However there are many characters which will wreak havoc with HTML (called HTML 'entities' because they have special meaning in HTML). Htmlentities will convert these characters to their html-safe equivalents. The most notorious of these are the greater than (>) and less than (<) characters.

Best Practice: Always use htmlentities for displaying data in a browser. Do not use htmlentities for storing data. (See Note 3 below)

For more information on htmlentities see http://php.net/htmlentities.

2) htmlspecialchars

This function works in the same way as htmlentities but only converts a subset of the HTML entities (single quotes, double quotes, ampersands, greater than (>) and less than (<) characters). In some situations this can be the desired behavior, but most of the time you will want to use htmlentities.

For more information on htmlspecialchars see http://php.net/htmlspecialchars.

Retrieving Data For Plain Text & CSV Files

Plain Text & CSV Files

There are times when you will be retrieving data from a database which will not be displayed in a browser but will, instead, be used for other purposes. Therefore, you should not use htmlentities or htmlspecialchars when:

  • You retrieve data from a database and you want to use it as plain text (in an internal report or e-mail message).
  • You retrieve data from a database and you want to generate a CSV file. Just do the CSV encoding transformations with commas and quotes.

What Happened to Addslashes & Stripslashes?

You will notice that two commands - addslashes and stripslashes - are conspicuously absent from this article. If you follow our best practice recommendations, you will not need to use either of them with the one exception of stripslashes which is used in the fix_magic_quotes function.


Notes

1- Handling embedded single quotes is not the only reason to escape certain characters. An equally important reason for escaping characters is to prevent SQL Injection, that is, the technique whereby a malicious user may try to insert SQL commands that may compromise the security of your web site. (The issue of SQL Injection will be examined in a future PHundamentals article. Though our emphasis has been on MySQL, it is an issue with all databases.)

2- The main reason for using mysql_real_escape_string instead of addslashes is that the former will handle many more characters that require special handling.(See: http://www.mysql.com/doc/en/String_syntax.html.) Addslashes will only escape " (double quote), ' (single quote) \ (backslash) and NUL (the null byte) with a backslash. Mysql_real_escape_string will take into account the character set of the current connection, and escape characters as needed. However, both mysql_escape_string and mysql_real_escape_string will additional characters that have a special meaning to MySQL, such as:

  • \0 An ASCII 0 (NUL) character.
  • \' A single quote (`'') character.
  • \" A double quote (`"') character.
  • \b A backspace character.
  • \n A newline character.
  • \r A carriage return character.
  • \t A tab character.
  • \z ASCII(26) (Control-Z). This character can be encoded to allow you to work around the problem that ASCII(26) stands for END-OF-FILE on Windows. (ASCII(26) will cause problems if you try to use mysql database < filename.)
  • \\ A backslash (`\') character.

In addition, mysql_real_escape_string will encode any newlines into their encoded (\n or \r\n) counterpart.

All of the database-specific functions (e.g., pg_escape_string) include these kinds of special cases for the particular database, whereas addslashes does not. For simple data it will work, but there is the possibility you may end up in a situation where addslashes alone will fail.

3- Best practice is to always store the data as "display-agnostic" thereby assuring you the maximum amount of flexibility while maintaining the data in its "raw" state. Therefore, if you need to display the data in a browser, you would use htmlentities. If the data is to be written to a CSV file, it can be retrieved as-is and then encoded accordingly with commas and quotes.


Contributors to this note include the following:

  • Dan Cech
  • Tim Gales
  • John Lacey
  • David Mintz
  • Chris Shiflett
  • David Sklar
  • Hans Zaunere
  • the PHundamentals team: Jeff Siegel, Mike Southwell