NYCPHP Meetup

NYPHP.org

[nycphp-talk] Filtering input to be appended inside email

Mikko Rantalainen mikko.rantalainen at peda.net
Tue Sep 13 10:19:18 EDT 2005


Daniel Convissor wrote:
> Hi Michael:
> On Mon, Sep 12, 2005 at 12:41:12PM -0400, Michael Southwell wrote:
>>
>>The point is simply to identify which scripts have sent emails to the 
>>known-bad addresses; those are the vulnerable ones.
> 
> I'm afraid that will lead people into both a false sense of security and 
> using email address blacklists.  Folks should audit their email scripts, 
> period.

I agree. Broken code is broken code. If you aren't sure if your 
email script works correctly, take it offline immediately.

>>There were other problems as well, which I noted in my polished 
>>version.  We need an officially sanctioned version of the function 
>>before we can post anything.
> 
> Agreed.  Here's what I think is a good starting point for discussion...
> 
> <?php
> // untested!!!!
> // MUST do is_set() checks on all of these for first!
> // left out for brevity.
> 
> if (eregi('^[a-z0-9_.=+-]+@([a-z0-9-]+\.)+([a-z]{2,6})$', $_POST['address'])) {
>     $address = $_POST['address'];
> } else {
>     echo 'bad email';
>     exit;

That looks pretty simple but it doesn't allow even nearly all valid 
email addresses.

I'd rather create two functions like

/**
	takes string $input_email and returns RFC 2822 section 3.4
	compatible address or empty string if input cannot be
	handled.
	http://rfc.net/rfc2822.html#s3.4.
*/
function getSafeEmail($input_email) { ... return $safe_email; }

and

/**
	takes string $input_header and encodes it as a single header
	to be used for mailing.
	http://rfc.net/rfc2822.html#s2.2.3.
*/
function getSafeHeader($input_header) { ... return $safe_header; }

and I'd put all input through these functions. Like $from = 
$_POST["FROM"] or so.

**

Of these, the first one is much harder to implement correctly. A 
simple implementation could only accept limited addr-spec format of 
syntax
	dot-atom "@" dot-atom
where the dot-atom is defined at http://rfc.net/rfc2822.html#s3.2.4. 
Note that this is much simpler than full address spec defined in 
http://rfc.net/rfc2822.html#s3.4.

Note that this "simple" format wouldn't allow all valid email 
addresses but at least it would allow stuff like
mikko.rantalainen+nyphp at peda.net
unlike many complex regexes that are meant to filter email addresses.

A simple, untested implementation would look like

function getSafeEmail($input_email)
{
	# http://rfc.net/rfc2822.html#s3.2.4.
	$dot_atom = "^a-z0-9!#\$%&'*/=?_`{|}~+-";
	# filter extra characters off
	$safe_email = preg_replace("@[^{$dot_atom}]@gi","",$input_email);

	if 
(preg_match("@[{$dot_atom}](\.[{$dot_atom}])*\@[{$dot_atom}](\.[{$dot_atom}])+ at i",$safe_email))
		return $safe_email;
	else
		return ""; # error
}


For the second function we have two possible ways to make sure that 
$input_header indeed contains exactly one valid header; either 
remove all line feeds from the input or append a space after every 
line feed which makes whole input a single header wrapped to 
multiple lines (http://rfc.net/rfc2822.html#s2.2.3.). I'll choose 
the latter method for this implementation. Again, this is untested.

function getSafeHeader($input_header)
{
	# split as defined in http://rfc.net/rfc2822.html#s2.2.
	list($name,$value) = explode(":",$input_header,2);

	# verify header name
	if (!preg_match("@^[".chr(33)."-".chr(126)."]+$@",$name))
		return "";

	# header cannot contain CRLF
	# our implementation strips out CRs, make sure all LFs
	# are safe and reinserts CRs
	$value = preg_replace("@\r@","",trim($value));
	$value = preg_replace("@\n@","\n ",$value);
	$value = preg_replace("@\n@","\r\n",$value);
	
	$safe_header = $name.": ".$value."\r\n";
	return $safe_header;
}

Body doesn't need to be handled unless you use HTML mail (shame on 
you), in which case all XSS issues are there waiting.

-- 
Mikko



More information about the talk mailing list