Aggregation Station June 29th, 2007

Duplicating efforts is bad, so I often start by googling, "rails functionality_i_want". Often helpful, but not always. "ruby functionality_i_want" is easily overlooked. I might end up doing ActiveRecord mapping, but there is much greater diversity in this space. That is how I found FeedNormalizer, my news feed parsing library of choice.

FeedNormalizer is consistant across feed types, allows modifying and adding parsers without changing your application, and can sanitize elements with html for you.

So how do you use this miracle brew?

  1. Install
    gem install feed-normalizer
    
  2. Off to the races
    require 'open-uri'
    require 'rubygems'
    require 'feed-normalizer'
    
    file = open('http://monki.geemus.com/feed/')
    feed = FeedNormalizer::FeedNormalizer.parse file
    # Pull whatever fields you might need
    p feed.title
    p feed.entries
    # etc
    
    Quick and easy. See this intro for details about how this works.

  3. As a fun addition, you can use headers check for updates before downloading the whole feed.
    # open the feed
    data = open('http://monki.geemus.com/feed/')
    # grab the relevant meta data
    last_modified = data.meta['last-modified']
    etag = data.meta['etag']
    
    # use the feed
    feed = FeedNormalizer::FeedNormalizer.parse data
    # etc
    
    # now ask nicely for updates
    begin
      data = open('http://monki.geemus.com/feed/', 
                   {'If-Modified-Since' => last_modified, 
                   'If-None-Match' => etag})
    rescue OpenURI::HTTPError
      # No updates, so feed was not downloaded.
    else
      # Updates
      feed = FeedNormalizer::FeedNormalizer.parse data
    end
    
Pretty slick, saves publisher bandwidth and your processing time. Everybody wins!

Go forth and aggregate!

Sorry, comments are closed for this article.