NYCPHP Meetup

NYPHP.org

[nycphp-talk] NEW PHundamentals Question

Joel De Gan joel at tagword.com
Tue Feb 10 18:14:12 EST 2004


> what's a Captcha?

Captcha is an image verification.
it is supposed to stop bots, but there are quite a few people who write
anti-captcha programs (I have had to do some for my work to get around
images)
Here is one I made for my email whitelist that is very effective in
keeping spam out of my inbox
http://lucifer.intercosmos.net/mail/

The source is on the main site for creating them in php.

to get around them you can utilize a program called gocr
(gocr.sourceforge.net) which is a very easily trainable commandline
program that output the OCR chars from an image. You can get around
colors using imagemagik 'convert' and lines are easily removed be
converting an image to black and white, then placing into a
multi-dimensional array in php and identifying vertical and horizontal
lines and removing them.

it is pretty easy, here are some functions for doing it.

<?
// just get and image
function getimage($img){
        global $width, $height;
        $im = ImageCreateFromJPEG($img);
        $width = imagesx($im);
        $height = imagesy($im);
   return $im;
}

// gets a greyscale image into an array
function getimarr($im){
        $width = imagesx($im);
        $height = imagesy($im);
        for ($y=0;$y<$height;$y++) {
                for ($x=0;$x<$width;$x++) {
                        $rgb = ImageColorAt($im, $x, $y);
                        $col = imagecolorsforindex($im, $rgb);
                        $cols = $col["red"]+$col["green"]+$col["blue"];
                        if($cols < 200){ $out = 0; }else{ $out = 1; }
                        $imagearr[$y][$x] = $out;
                }//rof
        }//rof
   return $imagearr;
}

// remove horizontal lines (vertical is an easy change to this)
function dehorz($imarr){
        for ($y=0;$y<count($imarr);$y++) {
                $isaline = true;
                for ($x=0;$x<count($imarr[$y]);$x++) {
                        if($imarr[$y][$x] <> 0){
                                $isaline = false;
                        }//fi
                }//rof
                // this is a line
                if($isaline){
                        for ($x=0;$x<count($imarr[$y]);$x++) {
                                if($imarr[$y-1][$x] == 1){
                                        $imarr[$y][$x]=2; // remove it
                                }//fi
                        }//rof

                }//fi
                if($isaline){
                        for ($x=0;$x<count($imarr[$y]);$x++) {
                                if($imarr[$y+1][$x] == 1){
                                        $imarr[$y][$x]=2; // remove it
                                }//fi
                        }//rof

                }//fi
        }//rof
   return $imarr;
}

// get rid os single spots...
function despeckle($imarr){
        for ($y=0;$y<count($imarr);$y++) {
                for ($x=0;$x<count($imarr[$y]);$x++) {
                        if($imarr[$y][$x] == 0 &&
                                        (
                                        ($imarr[$y+1][$x] <> 0) &&
                                        ($imarr[$y][$x+1] <> 0)
                                        )
                                ){
                                $imarr[$y][$x]=2; // remove it
                        }//fi
                }//rof
        }//rof
   return $imarr;

}

// for debug
function printhtmlimg($imarr, $debug=1){
        $ret = "\n<table border='0' cellspacing='0' cellpadding='0'>\n";
        for ($cy=0;$cy<count($imarr);$cy++) {
                $ret .= "       <tr>\n";
                for ($cx=0;$cx<count($imarr[$cy]);$cx++) {
                        if($imarr[$cy][$cx] == 0){ 
                                $out = "000000";
                        }elseif($imarr[$cy][$cx] == 2 && $debug == 1){
// debug red
                                $out = "ff0000";
                        }else{ 
                                $out = "ffffff"; 
                        }
                        $ret .= "               <td width=2 height=2
bgcolor='#$out'></td>\n";
                }
                $ret .= "       </tr>\n";
        }
        $ret .= "</table>\n\n";
   return $ret;
}

//then just
$im = getimage("example/C3VC.jpg"); #3P25.jpg
$imarr = getimarr($im);
echo "<h2 color=darkgreen>Captcha hacking <br>-or-<br> how to get around
those annoying 'security' images.</
h2>";

echo "<h3>This is a <b>live</b> demo of the 'line insertion' technique
workaround.</h3>";

echo "<h3>Original</h3>";
echo printhtmlimg($imarr);

echo "<h3>Horizontal filter (red removed)</h3>";
echo printhtmlimg(dehorz($imarr));

/*
	now just fix this up and run through gocr after dumping back to 
	file (remember to turn the reds to white) and start training it
	.. I managed about 80-90% accuracy with this.
*/
?>

And...  I just went off on a tangent didn't I?
Anyway.. well, in case you wanted to know how that is done in PHP...

cheers
-Joel De Gan


joeldg - developer, Intercosmos media group.
http://lucifer.intercosmos.net




More information about the talk mailing list