RSS and Atom in Action: Newsfeed Formats | Part 2/Page 3 | WebReference

RSS and Atom in Action: Newsfeed Formats | Part 2/Page 3


RSS and Atom in Action: Newsfeed Formats
Part 2

4.5.4 Atom identifiers

Unlike earlier formats, Atom format requires that you provide a unique and permanent identifier for each feed and each entry. Each <feed> and <entry> element must contain an element that contains an Atom identifier. An Atom identifier must be:

  • Unique. An Atom <id> must be universally unique.
  • Permanent. An Atom <id> must not change when the feed or entry is republished, exported to another system, or relocated to another host.
  • Formatted as an IRI. An Atom <id> must be a valid Internet Resource Identifier (IRI) in accordance with RFC 3987, which is a special form of URI that is allowed to contain Unicode characters.

That last bullet about RFC 3987 sounds a little intimidating, but generating ids for your feeds and entries is really not that difficult. In fact, you can use URLs because every URL is a valid IRI. That's probably the easiest route. Use a feed's alternate link as the feed's id and use each entry's alternate link as its id. That's what we did in the Atom feed shown in listing 4.3.

4.5.5 The Atom content model

To include content in an Atom entry, you use the <content> element. The <content> element is similar to a text construct, but it's more complex because it is designed to support six types of content. These types of content can be included inline within the body of the entry or out-of-line at a web location specified by a URI. Here are the six types of content supported by Atom:
  • Plain text—Text without any markup or any escaped markup
  • XHTML—Text that may contain XHTML markup. Since XHTML markup is valid XML, it need not be escaped. If you use XHTML in a feed, you must first declare the XHTML namespace, as we did in listing 4.3.
  • Text with escaped HTML—Text that may contain HTML markup, but with all markup escaped so that it is not interpreted as XML.
  • XML—Content may contain XML from a declared XML namespace. In this case, set the type attribute to the content-type of the XML data (and note that content-types for XML data must end with "+xml" or "/xml").
  • Inline content of any type—A element can contain data of any content-type if that data is encoded using Base64 encoding. Set the type attribute to the content-type of the data.
  • Out-of-line content of any type—A element can reference remote content by providing a link to that content in the src attribute of the element and setting the type to the content-type of the remote content.

To specify the type of content in a element, you use the type attribute. Like the type attribute in a text construct, the type can be text, xhtml or html. Let's take a look at some simple examples. Here is an example of a <content> element with type="text" (no HTML is allowed in text content):

And here is one with type="xhtml"; we've made the URL into a link:

Here's one with type ="html" and the same link; note that all markup is escaped:

As you can see, Atom has a well though thought-out, flexible, and well-specified content model. Now let's discuss how Atom's <link> element can be used to support podcasting.

4.5.6 Podcasting with Atom

Podcasting originated as a feature of RSS, but as the world moves to Atom as the new standard, the podcasters will too—and for good reason. Atom can support podcasting through the element. As is the case with RSS 2.0-based podcasts, you can have only one podcast per entry. But with Atom, you can have a different representation for each language and for each content-type. For example, if you want to make a podcast available in both English and German and in both MP3 and WMV formats, you can do it like this:

Newsfeed format family tree

4.6 Summary

Perhaps the best way to summarize this chapter is with the newsfeed format family tree, shown in figure 4.7. You can clearly see the simple vs. RDF fork and Atom's clean break with the past.

Here are some of the key points we covered in this chapter:

  • Atom Publishing Format and the various forms of RSS are XML newsfeed formats; originally invented to allow Netscape portal to syndicate news items from other Web sites.
  • Dave Libby and Dave Winer wrote the first RSS specifications in 1999; but since then RSS has forked into two incompatible versions: the RSS 1.0 RDF fork and Dave Winer's RSS 2.0 simple fork.
  • Atom and RSS formats can be extended; you can add your own new XML elements to the formats as long as you do it in a declared XML namespace.
  • Podcasting is a way to distribute any sort of file (not just MP3s for your iPod) via newsfeed, using either an <enclosure> element in RSS 2.0 or <link> elements in Atom format.
  • Podcasting client software, sometimes called podcatchers, and newsfeed readers use newsfeed metadata to decide which podcast files are to be downloaded.
  • RSS 1.0 and RSS 2.0 are the most common newsfeed formats, but both are essentially frozen and the specifications will not be further developed or clarified.
  • Atom Publishing Format (RFC-4287) is the IETF standard newsfeed format, and it's likely to replace the many incompatible forms of RSS now in use.

RSS and Atom in Action: Newsfeed Formats

This excerpt is taken from Chapter 4 of RSS and Atom in Action, written by Dave Johnson, and published by Manning Publications Co., Copyright © 2006 Manning Publications Co. All rights reserved.