Comparing XML docs

runrig on 2007-07-21T01:48:33

I've found a workaround for one of yesterday's annoyances, at least the one where a "folder compare" function (not really a file directory) in Informatica PowerCenter doesn't really compare everything in the folders. You can export folders as XML documents, so that's what I did, and then used XML::Diff to compare the documents.

Except that some of the elements in one document are not in the same order as the other document, and although I don't care, XML::Diff does. There are some commercial tools that will compare unordered XML elements, and I ran across one that I guess was free but is no longer.

So it was XML::Filter::Sort to the rescue (thank you grantm - and all the other XML folks), and I just sorted all the elements where I didn't care about the order, and then diffed the results. Several of the elements where just sorted by the name attribute, so I made a bunch of sorters in one go:(Updated code: Needed "./" prefix on NAME)

my @sorters = (
  map {
    XML::Filter::Sort->new(
      Record => $_,
      Keys => [ ['./@NAME'] ],
    )
  )  @list_of_elements
);

I also wrote my first actual XML::SAX parser for the task of deleting some attributes where I didn't care about differences in values.

And some of the attributes had encoded control characters in them e.g. , and those just came out as spaces, and for the purposes of this, I didn't care, but in other situations, I might care, so I'm wondering if there's a way to preserve those. Though I hear from a reliable secondhand source that there is no reliable way to preserve them :-(


Alternatively: XML::SemanticDiff

tagg on 2007-08-08T07:12:36

Could you have used XML::SemanticDiff instead? That seems to do what you want...

Re:Alternatively: XML::SemanticDiff

runrig on 2007-08-08T17:25:38

I looked at XML::SemanticDiff, and XML::Diff seems to suit my purposes better. XML::SemanticDiff tells you that there's a difference and where the difference is, and XML::Diff tells you all that plus what the difference is, though the output is more verbose. And I would still have to sort and filter things to see the actual differences that I want to see.