More XML Sorting

runrig on 2007-07-25T16:53:46

A few days ago I thought I had it all nailed, sorting the elements, but then I noticed this pattern in the XML (seems like a badly designed schema, oh well):


 
   
   
   
   
 

In one doc, the FOO's came first, and in the other, the BAR's came first. XML::Filter::Sort didn't handle sorting non-contiguous elements, nor sorting by element name. I briefly looked into patching it to handle that case, but decided against bloating a nice, simple, module API.

I remembered that XSLT could sort, so I started looking into that, and tried the first thing that came up when you search CPAN for XSLT, XML::XSLT. No matter what I tried, though, nothing worked. Then as a last resort I read the fine docs and saw that "sort" was not yet implemented (update: also, it seemed to be applying templates from the bottom up -- see xslt below)

Then I tried XML::Filter::XSLT which was based on XML::LibXSLT (and libxslt), which I had much higher hopes for. I had trouble at first getting it to sort by element name while preserving the attributes (which I could never quite find a full example of), but finally came up with this:

my $sorter = XML::Filter::XSLT->new(Source => {String => <<'EOT'});






  



  
    
    
      
      
    
  



  
    
  



EOT

One interesting effect was that the encoded characters in the attributes (

, 
, and 	
), were now coming out as unencoded characters, where previously they were just coming out as spaces. I'm still not sure what the best way would be to preserve the encoding, if I happened to care about preserving them...


parsed entities

slanning on 2007-07-26T09:43:33

XML parsers do that entity parsing. Some people will say that it doesn't matter whether the XML contains &amp; or &, because your text isn't actually an object in the physical universe, but rather an abstract representation of a Platonic unicode document. We dirty Perl programmers get to put up with that crap, too.