As I was testing my link redirector Servlet for the linkblog
asked what I was doing about search engine crawlers. I told him I was inspecting the user-agent on all requests and excluding anything with the words bot
, which I knew was not hardly enough.
I was ready to live with it, when I suddenly remembered that AWStats
, my favorite logfile analyzer, does a pretty good job at keeping track of robots/spiders. It actually includes a Perl module with around 400 regexp user-agent matches
for all sort of known robots, spiders and crawlers.
I converted the AWStats lookup data into a Java class, Robots
, which I used in my Servlet.
Thanks to Laurent Destailleur
, the author of AWStats, for allowing me to release it in the public domain