Crawlers Detection in Java

<1 min read

As I was testing my link redirector Servlet for the linkblog, Rick asked what I was doing about search engine crawlers. I told him I was inspecting the user-agent on all requests and excluding anything with the words bot, crawler or spider, which I knew was not hardly enough. I was ready to live with it, when I suddenly remembered that AWStats, my favorite logfile analyzer, does a pretty good job at keeping track of robots/spiders. It actually includes a Perl module with around 400 regexp user-agent matches for all sort of known robots, spiders and crawlers. I converted the AWStats lookup data into a Java class, Robots, which I used in my Servlet. Thanks to Laurent Destailleur, the author of AWStats, for allowing me to release it in the public domain.