Crawlers Detection in Java

<1 min read

As I was testing my link redirector Servlet for the linkblog, Rick asked what I was doing about search engine crawlers. I told him I was inspecting the user-agent on all requests and excluding anything with the words bot, crawler or spider, which I knew was not hardly enough. I was ready to live with it, when I suddenly remembered that AWStats, my favorite logfile analyzer, does a pretty good job at keeping track of robots/spiders. It actually includes a Perl module with around 400 regexp user-agent matches for all sort of known robots, spiders and crawlers. I converted the AWStats lookup data into a Java class, Robots, which I used in my Servlet. Thanks to Laurent Destailleur, the author of AWStats, for allowing me to release it in the public domain.

IKEA, end of the line.

1 min read

We went to IKEA yesterday to have our bed replaced, which they agreed to do one more time. Fine by me, until I found out they no longer do assembly. No thanks. I'm not putting it together myself, or paying an external contractor to do it. We're getting a refund as soon as they pick it up.

The interesting part of the story is that the service manager basically told us they no longer use solid wood which is why the beds break so easily.

Now we're gonna have to find a bed that won't break, and match the color of our bedroom furniture. Oh Joy!

On a side note, I can't believe how inefficient their computer and inventory system are. It's amazing that they make so much money considering how archaic the whole process is.

My Java Cursor

<1 min read
I was reading this article on enabling animated pointers in Windows XP and thought to myself that it would be nice to have a Java-themed mouse cursor. I took my Java category icon () and rotated to make it look like a pointer ( ), then used IconArt to turn it into a cursor. I was able to swap the regular Windows cursor by double-clicking on the Normal Select cursor, in the Mouse control panel, under the Pointers tab. If I had any kind of graphic design skills, I would have gladly made a fully animated set/scheme. Now, that would be really cool.

Popular Links

<1 min read

I've implemented a way to view the linkblog's Popular Links. The page and feeds list the most popular links in the last 24 hours. The lists are updated at least every hour.

You may have noticed that, for the last few days, all links are redirected when clicked. Needless to say that IPs are checked for dups, etc. The user agents are also matched against a list of some 400 known bots, crawlers and spiders. I'll post in more details about the bot discovery process later in the week.

I'm still debating whether I want to include the number of reads on the page. I pretty much decided against doing so in the feeds, as it would cause aggregators that do not pay attention to modification dates (like bloglines, etc.) to constantly reload the feed.

I'm still working a few of the details out, but as usual, let me know if there are any problems.

Subversion diff with vim

<1 min read

Once in a while I have to use Subversion on a Linux box. Not a big deal, except for diffs. The standard svn diff output is not the most readable. Vim, or more exactly vimdiff, does a terrific job at displaying differences between files side-by-side. Here's a little Bourne shell script I wrote that uses vimdiff to view the differences between a local file and the latest revision in the repository:

Nothing too fancy. It uses svn cat to get the latest rev, saves it to a temporary file, and opens both files in vimdiff:

The temp file is deleted as soon as vimdiff is quit.

[@389]
I also just found svncommand.vim, a nifty Subversion integration plugin.