sebastian.martinez

Parsing an OPML with Ruby

Posted by sebastian.martinez
on February 20, 2009

And Ruby just doesn't stop surprising us!! In the past we have to deal with XML files and parse them, incredibly easy task using Hpricot library. Now the turn was for OPML(OPML) (Outline Processor Markup Language) files. In case you are not familiar with this type of files, its most common use is to exchange lists of web feeds between web feed aggregators.

We found this function to parse the OPML document recursively preserving its structure in the desktop weblog(desktop weblog), that does the job of extracting the feeds, and modified it a bit. Now it returns a hash containing the title of the articles as keys, and its links as values.

Here's the function:

def self.parse_opml(opml_node, parent_names=[])
    feeds = {}
    opml_node.elements.each('outline') do |el|
      if (el.elements.size != 0)
        feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
      end
      if (el.attributes['xmlUrl'])
        feeds[el.attributes['title']] = el.attributes['xmlUrl']
      end
    end
    return feeds
  end

All you have to do is call it this way:

require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])

Pretty easy, huh? Try it out and leave your comments...

| |
comments powered by Disqus