Erik's Weblog 2.0

September 3, 2002

RSS 0.94: Spam Free

The RSS 0.94 draft is currently discussed on various blogs. I'm very interested in the matter since we made jTalk capable of generating both RSS and RDF channels. Like many others, I have a few ideas on extending the proposed specification.

Spam Free

I completely agree with Chuck Shotton. There is an pressing need for a content type attribute. But it also must be explicitly supported by the <webMaster/> and <managingEditor/> sub-elements.

Most of us purposely employ various methods to cloak email addresses on web pages, while they are left wide open on all syndicated feeds. How long will it be before the email extractors figure out a way to collect addresses from our RSS-based feeds? They might even do so already. The one thing we surely don't need is more spam.

Moreover the current email address format [[email protected] (Full Name)] is non-standard. RSS 0.9.4 needs to utilize common Internet standards, not create unsupported ones.

The addition of a type attribute would allow us to make use of our clever cloaking techniques. For examples:

  <webMaster type="text/plain">
    erik at thauvin dot net
  </webMaster>

  <webMaster type="text/html">
    &lt;img src="http://host/email.gif"&gt;
  </webMaster>

  <webMaster type="text/html">
    &lt;a href="http://host/contact.html"&gt;contact me&lt;/a&gt;
  </webMaster>

  <webMaster type="text/html">
    &lt;a href="ldap://host/o=abc??sub?(cn=John Doe)"&gt;John Doe&lt;/a&gt;
  </webMaster>


The sky's the limit.

Comments

I'd also like to see an optional <comments/> sub-element added to <item/>. The required tag value would contain the location of the readers' comments associated with a specific item. For example:

  <comments>http://host/?comments=1&postid=1</comments>

Base

Finally, an optional <base/> sub-element added to <channel/>. The required tag value would contain the absolute URI of the current channel. In the exact same fashion the <base> tag is used in HTML document headers. For example:

  <base>http://host/path/</base>

RSS aggregators will use the channel's base location to properly handle all relative links, etc. In other words:

  <link>2002/08/27</link>

would be automatically interpreted as:

  http://host/path/2002/08/27

Dates

I don't agree with the purpose and format of the new <pubDate/> sub-element of <item/>. First, the obsolete RFC-822 reference should be updated to point to RFC-2822. Furthermore, the value should take either date or date-time formats. Meaning that it should accept:

  Tue, 3 Sep 2002 13:13:11 GMT

or

  3 Sep 2002

Sorry, Brent. I feel your pain, mon ami. But the date should reflect the actual publication date, not the modification date/time. What you're really looking for is a <modDate/> tag. Not a bad idea. I like it.

I've checked the <dc:date/> implementation in quite a few RSS 1.0 feeds, and there really doesn't seem to be much willingness to provide a specific time at the item level.

Sometimes it is wise to learn from the competition. Having multiple choices is a good thing.