toward the promised land of playlists, through the valley of plists

rjbs on 2006-01-17T16:55:10

I am trying to produce a SAX-based parser for Apple Property List documents. It's been a frustrating, educational experience. BDFOY has a module, Mac::PropertyList, but it's not what I want. First of all, it can't just produce a simple Perl datastructure from a deep plist. That's fine: I could just patch it to do so, and it would only take a few lines. The bigger concern is that it does its parsing with regular expressions, and takes more than twelve hours to turn my iTunes Music Library file into an object. I'm not sure how much more than twelve hours, because I gave up, at that point; I didn't want my CPU pegged while I was on the bus back to work.

Maybe XML::SAX will not do any better. I'll probably write a do-nothing SAX handler to see how long it takes just to dispatch all the events to no-op subs. I really hope it's less than half a day, since the hard work should be getting done by expat.

Anyway, this has been my first experience with SAX, and, really, with much XML work at all. I know a fair bit about XML, but I've done nearly nothing with regard to handling it with something other than a specialized module. (I have, for example, used XML::RSS a lot, but that mostly shields me from the XML.) SAX seems like a really cool system. I'm not entirely clear on how to do what I want, yet. My first pass at a solution nearly did what it needed to, but was completely hideous. I think my biggest problem is that I want to change handlers and perform a recursive descent. That is, I want each element to do something like say: "now that I'm handling a dict element, I want to use the dict parser until I see the end_element event, and then I will invoke some callback with the value generated by the dict parser."

I think this is possible -- and not a wildly misinformed plan -- but I'm not sure how to do it from the XML::SAX documentation. I think those docs assume that I'm familiar with SAX itself, which I'm not.

John R. suggested that I should just write some sort of XS interface to Apple's plist parser, but I'm pretty sure that my C skills are nowhere near the required level. I guess that could be a reason to brush up on my C!

In the end, this really wouldn't be so ridiculous, if Apple had just done something more reasonable for plists. Why aren't they traditional Lisp-form-like plists? If they wanted to translate them to XML, why didn't they translate them to sensible XML? It's weird to imagine that the programmers in charge just failed to understand XML, but I guess I shouldn't be surprised, anymore, at bad XML applications.

I just want my iPod to suggest that I should listen to all of "OK Computer" once in a while. Is that so wrong?


Patches welcome :)

brian_d_foy on 2006-01-17T22:53:31

Mac::PropertyList definitely sucks for big files.

In my implementation, I actually wanted to parse the files on a FreeBSD machine, so I didn't want to do any Mac-specific coding. For small files, which is whatI had to deal with, my approach was fine. Everyone seems to want to parse their iTunes Library file, and those are ridiculously huge. I tried to get around that with a iTunes Music Library binary parser and was most of the way there until they changed the format from odd to completely goofy.

If anyone comes up with a better way to do it, I'll gladly patch Mac::PropertyList.

Besides that, the interface to Mac::PropertyList isn't great. It simply creates a big data structure. I intended other people to write layers above that to get to the data, so I didn't do a lot to make the data access any easier.

The situation is even worse now since Apple moved to a binary format for plist files. You can convert it to a plain text form, but that''s now an extra step.