Erik's Weblog 2.0

February 22, 2006 +1 more...

[@480]

Crawlers Detection in Java

As I was testing my link redirector Servlet for the linkblog, Rick asked what I was doing about search engine crawlers. I told him I was inspecting the user-agent on all requests and excluding anything with the words bot, crawler or spider, which I knew was not hardly enough.

I was ready to live with it, when I suddenly remembered that AWStats, my favorite logfile analyzer, does a pretty good job at keeping track of robots/spiders. It actually includes a Perl module with around 400 regexp user-agent matches for all sort of known robots, spiders and crawlers.

I converted the AWStats lookup data into a Java class, Robots, which I used in my Servlet.

Thanks to Laurent Destailleur, the author of AWStats, for allowing me to release it in the public domain.

Post a comment

 

Comment Preview