NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Apache access_log integrity

D C Krook dkrook at hotmail.com
Thu Oct 2 00:59:29 EDT 2003


Folks,

I'm trying to trim the fat from Apache's access_log: removing my own IP 
addresses from the log; stripping referer spam; bots; etc.

While I understand that I can exclude known IP addresses and other common 
patterns via mod_setenvif, I'd like to be able to do this on an ad hoc basis 
when I notice certain spikes in useless records in the log and/or when my IP 
changes when hitting my own site from various wireless points.

My first idea was to grep the logs by using a shell or Perl script that I 
could add to my daily cron or call arbitrarily like so:

#!/bin/sh
grep -v "192.168.1.1" /var/log/apache/access_log > 
/var/log/apache/access_log.tmp
mv /var/log/apache/access_log.tmp /var/log/apache/access_log
/export/home/krook/bin/restart-apache.sh

Of course, this has the drawback of restarting Apache everytime the 
access_log is changed by the script, but the second or two of down time is 
acceptable if it means logs that can be regularly analyzed for useful 
reports that don't have "http://jeff-knights-online-viagra-megastore" as my 
top referer.

I'd like to know if anyone else had addressed this problem in a sucessful 
way or has any best practice, either via mod_setenvif, Perl, CLI PHP, cron 
etc.

I've Googled the following topics without any good results:
"strip line from access_log perl"
"clean access_log"
"setEnvIf"
"eliminate referer spam access_log"
"remove line from log"


Thanks in advance for any tips.

-Dan

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. 
http://join.msn.com/?page=features/virus




More information about the talk mailing list