WARNING: COMPUTER NERD POST FOLLOWSIn writing my blogs and running my server, I've made an effort - from a technical standpoint - to streamline operations as much as possible. I've also made an attempt to make information about this site as public as possible, for the benefit of my reader(s). That's why, for instance, I've made my
web stats available for all to see.
In doing this, however, I've run up against a particularly frustrating type of spam:
Referer Spam. In short, there are herds of maliciously controlled computers out there that will crawl across the web, looking for sites (such as mine) that publish their stats, and will continually make requests for a page but claim that they were referred to the link by some website they're trying to promote. Since I publish my stats, I then show that a bunch of people came to my website via a link on "besttexasholdem.com" or "livenudegirls.com" or some website that I can assure you doesn't actually link to me. This benefits them in Google's
page rankings, however, since suddenly they have a bunch of pages linking to them.
Now, despite the fact that I generally don't let search engines crawl through the stats pages (using the
robots.txt file), that only eliminates the benefits for the referer bots - it doesn't fix the problem (nor does it change the fact that they're filling my log files with crap that still shows up in the stats.) So, after ignoring the problem for months (if not
years), I finally got off my butt and implemented something that should cut down on the problem considerably.
Using a combination of
mod_security2 and mod_setenvif (both are modules for
Apache) I've managed to deny web access to most of the bots and I don't even log the access anymore. If there are any matches from a list of substrings in the referer field of the HTTP request header, mod_security sends back an error status of
412, which denies access but is still logged. I then use setenvif with that same list to set an environmental variable, one I tell the log files to ignore. This solves all my problems - except for the fact that I'll have to periodically check to see if there are any new pattens for referer spam headers.
I guess it's good enough for now, and it's a bit of peace of mind, too.