NYCPHP Meetup

NYPHP.org

[nycphp-talk] Dealing with forum spammers

csnyder chsnyder at gmail.com
Tue Oct 14 12:39:52 EDT 2008


On Tue, Oct 14, 2008 at 11:44 AM, Joe <joedevon at yahoo.com> wrote:
> In response to the fellow having problems with forum spammer, google "Bad Behavior" and install it.
>
> Other ideas.
>
> Create a question where people have answer for example "5+2". Or forego that because it's a pain and simply add a form field that you use css to hide from humans and if that field gets filled out, then you know it's a bot and you don't let them in. Or alternatively, if they fail that test, you can pop up a captcha. You can also add some javascript which scrapers tend not to scrape, so if the javascript wasn't pulled with the page, you know it's likely a bot.
>
> Also add a referer requirement. Make sure the previous page to form submission came from one of your domains. If it didn't, it's likely a scammer.
>
> Hope this helps.
>

Yeah, this is a "really hard" problem, on the order of stopping spam
from coming into your inbox.

For most sites, on most days, you can get by with a few of the hacks
suggested above. Your goal is really to make your site just different
enough that the spammers will have to rewrite their script in order to
spam you. Most will simply move on rather than do that.

But since we're developers here, we kind of need to think long-term
about the problem. The more 5+2 solutions and tarpit hidden fields
that spammers encounter over time, the smarter their scripts are going
to get. It's a classic arms race.

If I was going to write any sort of comments framework today (and that
includes web forms, or anything else that solicits input from the
anonymous web) I would design it so that everything went through a
spam filter first, and the bigger the better. Like GMail if you can
accept that from a privacy point of view, or your organization's
internal spam filter. Then, and only then, would I allow the filtered
comments/responses back into the web system.

The downside is a huge increase in complexity, and a potential lack of
transparency (false positives are a problem, and how do you train the
system?). But comment spam IS spam, they are the same problem.
Actually a little worse, because it's much easier to find comments
forms on the web than it is to find working email addresses.


Chris Snyder
http://chxor.chxo.com/



More information about the talk mailing list