Last year I wrote an article called How to deal with (block) semalt referrer spam in your Analytics Data. This approach has been fairly successful in lowering the amount of crawler based referrer spam. As new offenders have popped up I've adjusted my Apache Vhost configurations to block more of these from appearing in my analytics data, so this article is to serve as an update to that first one.
In the previous article I discussed using mod_bw to bandwidth limit the delivery of the cease and desist notices. I've since discontinued this as it was causing an increase on my server loads. I like the idea of doing this, slowing down the requests, but I just didn't want to continue to give this the server resources. And less affective due to some of these referrer spammers use of botnet networks.
The quick and dirty of how this works, enable three common modules in Apache using the following commands:
Adjust your Apache vhost config file to load a hosts black list file, and RewriteCond's to catch common referrer spam from common referrers. Anything matching these rewrite conditions will be rewritten to use the proxy to localhost:8888 which displays (and logs) my Cease and Desist notice.
Updated rewrite rules in Vhost Config to block common referrer spam:
The contents of hosts.deny:
While it is likely that my Cease and Desist notice isn't legally binding, if I can discourage some of these script operators to stop crawling me and behave properly then it's worth it. All the normal legal disclaimer things here; I am not a lawyer, this is not legal advice, consult your lawyer to determine the legality of doing this, and any other thing that can be said to tell you to use this at your own risk.