New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS fixes #242
RSS fixes #242
Conversation
The current news-rss.xml fails http://www.feedvalidator.org for a couple of reasons, making feed readers like Liferea misparse it. This patch fixes one of those issues. As @edolstra noted in #90, HTML is not allowed directly in the RSS `<description>` tag. Instead it's supposed to be a text node, where the text can be parsed as HTML. So e.g. an `<a>` tag needs to be written in the feed as `<a>`. This could be handled more easily in an Atom feed, which allows tags from namespaces like XHTML to be embedded directly as long as the namespaces are declared properly. This could also be simpler if xsltproc supported the XPath Functions 3.0 standard, which defines a `serialize` function that does what the mode="serialize" template does in this patch.
The current news-rss.xml fails http://www.feedvalidator.org for a couple of reasons, making feed readers like Liferea misparse it. This patch fixes one of those issues. The RSS pubDate element is required to be an RFC-822 date-time. The entries in news.xml did not conform to that specification for two reasons: a missing comma, and swapping the month and day fields. So I fixed both of those issues in news.xml, and then fixed news.xsl to extract the date substring at its new offset. This does mean that the HTML version of the news has its month and day swapped now, and frankly I liked it better in the previous month-day order. If desired the previous output can be recovered, by either continuing using the substring-pasting approach but chopping up the substrings further, or changing news.xml to use ISO 8601 date-times and using the http://exslt.org/date/ extension functions.
The channel's <link> tag should point to an HTML version of the same content as the feed, so it's better to link to /news.html than to /. In addition, the RSS specification says: "the image <title> and <link> should have the same value as the channel's <title> and <link>." http://www.rssboard.org/rss-specification#ltimagegtSubelementOfLtchannelgt Finally, feedvalidator.org says the feed document should include a link to the canonical URL for that feed: http://www.feedvalidator.org/docs/warning/MissingAtomSelfLink.html
The item's <link> tag should point to an HTML version of the same content as the item. However, there is no URL for each item in the current site, so there's no good place to link to. Fortunately, the link tag is not required when a description is present, so we can just drop that tag entirely. Also, each item should have some sort of unique ID in the <guid> tag. If a feed reader sees an item that's different from any item it has seen before, then the guid allows the reader to distinguish between new items and items that have been edited. http://www.feedvalidator.org/docs/warning/MissingGuid.html Since it has almost never happened that two items were posted with the same pubDate, I've chosen to use the pubDate itself as the guid. There were three duplicates, but I've made them unique by adding one second to the pubDates of the three duplicates that appear earlier in the file, so that sorting by descending pubDate would leave the file order unchanged.
First, I removed the maxItem limit from the RSS feed. (It's still used for the HTML version.) The maxItem limit of 1000 was effectively infinite at the rate news items are being published, and each item is quite small so there isn't much bandwidth cost to publishing a lot of them anyway. Second, since the feed does contain every entry ever posted, it is a valid "complete feed" in the sense of RFC5005 section 2, so add the tag marking it as such. If the RSS feed ever gets too large, then RFC5005 section 4 describes how to paginate it into archived feeds.
Hi! Thanks for the contribution! This repository is much less frequented and has much fewer people with approval and merge rights; don't be alarmed (yet) by the lack of traffic around this PR. 👍 eyballed each commits; good separation, easy to understand. I also verified using This'll need a merge/rebase on top of the last news update (sorry), but since your commits are so well-behaved I bet it's going to be easy! Then, once updated, LGTM 🎉! |
Here are an assortment of fixes for the
news-rss.xml
RSS feed. Most of these commits just fix issues reported by http://www.feedvalidator.org although there are a couple of notable side effects.Since feed items did not include a
<guid>
before, adding one causes all items to appear unread. This is unfortunate but avoids bigger potential problems later.Since the date format in
news.xml
was not consistent with the RSS specification (which requires RFC-822 format), the HTML version of the news changed from "month day year" order to "day month year". This is fixable in a couple of different ways if it matters to you.As long as I was doing all that, I also took care of my personal crusade, which is RFC5005 "Feed Paging and Archiving" support. Hardly anybody supports it, but it's really easy to do if you're sticking your complete history in the feed anyway, so I figured I might as well.