RSS (6)

1 Name: Doctroid : 2008-08-11 06:53 ID:L59QDBV8 [Del]

What's with this board's RSS? Goolge Reader doesn't see half the new posts. But I just tried using Safari's builtin RSS, and it does seem to get the new posts. Unfortunately I normally do not use Safari.

Is there something nonstandard about the RSS source?

Is Dag still alive?

2 Name: Doctroid : 2008-08-11 09:39 ID:jPMr//pr [Del]

The above post did turn up, though.

3 Name: Doctroid : 2008-08-11 10:40 ID:jPMr//pr [Del]

I know from nothing about RSS. But, here's a truncated version of this site's RSS source:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Interrobang Cartel message board</title>
<link>http://www.interrobangcartel.com/forums/index.html</link>
<description>Posts on Interrobang Cartel message board at www.interrobangcartel.com.</description>
<item>
<title>RSS (2)</title>
<link>http://www.interrobangcartel.com/forums/kareha.pl/1218451990/</link>
<guid>http://www.interrobangcartel.com/forums/kareha.pl/1218451990/</guid> <comments>http://www.interrobangcartel.com/forums/kareha.pl/1218451990/</comments>
<author>Doctroid</author>
<description><![CDATA[ <p>What's with this board's RSS? Goolge Reader doesn't see half the new posts. But I just tried using Safari's builtin RSS, and it does seem to get the new posts. Unfortunately I normally do not use Safari.</p><p>Is there something nonstandard about the RSS source?</p> ]]></description>
</item>

[...]

</channel>
</rss>

And here's RSS source from Ravelry, which as far as I can tell is handled correctly by Goolge Reader (as are, to my knowledge, all the other non-IC feeds I subscribe to, but then again I didn't notice the problem with this feed for quite a while):

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Doctroid's Ravelry friend activity (filtered: project photo, stash photo, queue, fave, magic link, comment, handspun)</title>
<link>http://www.ravelry.com/people/Doctroid/friends/activity?</link>
<pubDate>Sun, 10 Aug 2008 01:26:55 GMT</pubDate>
<description>Doctroid's Ravelry friend activity (filtered: project photo, stash photo, queue, fave, magic link, comment, handspun)</description>
<item>
<title>jwgh's Cube blanket</title>
<link>http://www.ravelry.com/projects/jwgh/fibo-optic</link>
<description>&lt;p class='user'&gt;&lt;a href="http://www.ravelry.com/people/jwgh"&gt;&lt;img border="1" src="http://avatars.ravelry.com/jwgh/68853/1115660-1_medium.jpg" style="border: 1px solid #666666;" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p class='type'&gt;&lt;img class="icon activity_icon" height="16" src="http://assets.ravelry.com/images/silk-color_swatch.png" width="16" /&gt; &lt;a href="http://www.ravelry.com/projects/jwgh/fibo-optic"&gt;jwgh's Cube blanket&lt;/a&gt; &lt;/p&gt;&lt;p class='photo'&gt;&lt;img border="1" src="http://farm4.static.flickr.com/3242/2744477157_bb31d0bba6.jpg" style="border: 1px solid #666666;" /&gt;&lt;/p&gt;&lt;div class="notes" style="margin-top:1em;"&gt;&lt;/div&gt;</description>
<pubDate>Fri, 08 Aug 2008 22:01:47 GMT</pubDate>
<guid>http://www.ravelry.com/projects/jwgh/fibo-optic?activity=8251608</guid>
<author>jwgh</author>
</item>

[...]

</channel>
</rss>

The elements are in different order, but I assume that's not significant. More substantive-looking differences are:

(1) Ravelry's rss element includes the xmlns:dc attribute; IC's doesn't.
(2) Ravelry's channels and items both have pubDate elements; IC's doesn't. (But the data are human readable so I suspect are not parsed by the reader software.)
(3) Ravelry's item description is some HTML with the < and > turned into &lt; and &gt;; IC's is the ![CDATA[ tag.

The IC item above is one that was shown by Goolge, but it doesn't look to be substantively different from items that weren't.

How does the reader software know what's been read already and what hasn't? That is, when it gets the RSS source, how does it decided which items to show? It seems to me after a new thread is started, any time that thread is updated the only change in the RSS will be in the item title, which shows a larger message count than before. Since there is no pubDate element it can't use that to decide, and since the description element is based on the start of the thread rather than the new message in the thread (which, by the way, is annoying), it can't use that either. Hmm, if I look at all IC items in Goolge Reader, it shows "All Roads Lead Away From Rome" twice, once with a post count of 18 (in the title) dated last Dec 6, and one with a post count of 11 dated Dec 13! But in fact posts 11 and 18 were made yesterday and today, respectively. Weird, weird.

OK, I just hit refresh and the entry dated Dec 6 now shows the post count as 20. So it's seeing the IC entry in the RSS but deciding it's something I've already read, even though the title has changed.

... OK, just checked some other feeds; most use pubDate elements but some don't, so that doesn't seem to be relevant. And some use ![CDATA[, so that's probably not it either.

So could it be Google Reader is simply regarding the item as unchanged and hence read because its description is unchanged? But then, how come it shows two 'All Roads' items, having the same description? Weird.

4 Name: talysman!!/0CigS8/ : 2008-08-11 12:18 ID:BRvmOLbc [Del]

The main thing wrong is this (from the W3.org validator):

> RSS feeds should be served as application/rss+xml (RSS 1.0 is an RDF format, so it may be served as application/rdf+xml instead). Atom feeds should use application/atom+xml. Alternatively, for compatibility with widely-deployed web browsers, any of these feeds can use one of the more general XML types - preferably application/xml.

The other thing wrong is that the <author /> element is not an email address.

However, I'm not sure why either would specifically mess up Google Reader. It could be that Google Reader checks the optional PubDate, or that only crawls at specific times, whereas a feedreader embedded in a browser checks your feeds when you open your browser.

The xmlns attribute is not the problem, though. XML Namespaces are used to extend RSS. In the case of Ravelry, they're using the Dublin Core Metadata Element Set, which adds a bunch of elements like "contributor", "creator", "publisher", "rights".

5 Name: Doctroid : 2008-08-11 13:28 ID:jPMr//pr [Del]

Well, other feeds without pubDate don't appear to have a problem, and refreshing causes it to get the new title (with the new post count), but the item shows up as already read with a date of last December. Safari on the other hand, when you refresh it, gets the new title and assigns today's date. It seems Google knows the latest RSS information but doesn't regard it as indicating the item is new.

6 Name: talysman!!/0CigS8/ : 2008-08-11 14:32 ID:BRvmOLbc [Del]

The new forums and wiki (whenever those are finally available) has much more configurable RSS/Atom options, so I'll just concentrate on getting that ready. Then, you'll be able to try out RSS 1.0, RSS 3.0, Atom, and whatever else we concoct to see what Google Reader can use correctly. Otherwise, since it seems to be a Google Reader issue, you'll probably need to use Safari or something else to read just this feed. Dag's not around.

Name: Link:
Leave these fields empty (spam trap):
More options...
Image: