Extracting RSS URIs from NetNewsWire Prefs

OneTrueDabe on 2003-05-28T00:28:54

God bless Matt Sergeant! (and "XML::XPath")

I wanted to get at the list of RSS URLs (URIs, URnID10T, whatever) from the NetNewsWire ".plist" file under MacOS X. It's XML so I figured it'd be easy to parse.

Well, "sort of". Instead of being hierarchical or using attributes or any other sort of sane structure, Apple's "plist" files are designed to be very generic so they'll work for any application:




  autoRefresh
  2
  flOpenURLsInBrowserInBackground
  
  Subscriptions
  
    
      home
      http://www.perl.com/
      name
      Perl.com News
      rss
      http://www.perl.com/pace/perlnews.rdf
    
    
      home
      http://www.slashdot.org/
      name 
      Slashdot
      rss
      http://slashdot.org/slashdot.rss
    
    
   


Note that the Property List format is designed to have a very generic schema suitable for any type of application, so it stores keys and their corresponding values as character data inside of elements all at the same level, rather than forcing (or, if you're a "glass-half-empty" type of person, "allowing") each application to define its own structure.
  
  

  
  
    Slashdot
    http://slashdot.org/slashdot.rss
  
Unfortunately, that makes extracting the value of a given key a bit trickier -- now you need to find, say:

"The first '<string>' element following a '<key>' which contains the string 'rss', all within the first '<array>' after a '<key>' containing the word 'Subscriptions'"

I thought about setting up a long chain of SAX event handlers to track what keys have or have not been seen yet, maintaining the state of the previous key(s) as I went, but I figured all that code and complexity would make a program more susceptible to bugs.

Enter XML::XPath. By carefully crafting just the right XPath query, I was able to squeeze all that logic down into the holy grail of Perl scripts -- a one-liner:

perl -MXML::XPath -e 'foreach $node (XML::XPath->new(filename => "$ENV{'HOME'}/Library/Preferences/com.ranchero.NetNewsWire.plist")->find(q{//key[contains(string(),"Subscriptions")]/following-sibling::array[1]/dict/key[contains(string(),"rss")]/following-sibling::string[1]})->get_nodelist) { print $node->string_value, "\n" }'

Awesome! When I saw how well that worked, I must have done the Happy Dance for at least 15 minutes. *THAT'S* what programming is all about! :-)

Of course, There's More Than One Way To Do It, and I would be reluctant to deploy that one-liner in a production environment -- woe betide the poor programmer who had to maintain it after me! -- but as for neat hacks, this one reigns supreme!