Everyone I talk with says that mucking with the XML exported by Informatica's PowerCenter is not recommended, or that they tried and failed...so I had no choice but to ignore that advice when we were upgrading from version 7 to 8. I think the advice must be from people who don't quite know what they're doing, and/or "parse" with grep/awk/sed.
I've been messing with more XML in the last few weeks than I have in the last few years. I have my own (shared) opinion about Informatica/ETL (and I've taken up Aristotle's call to action), but at least it provides an opportunity to practice some XML-fu. Most transformations were simple changing of some attributes, but there was one issue where after importing into v8, if you delete a group from a Union transformation, the GUI crashes. So I created a Union transformation from scratch, exported it, and compared it to what I was importing, and hey, there was some stuff missing! So I wrote the following:
# Fix Union transformations
my $union_cnt;
for my $trans (
$root->findnodes(q[
//TRANSFORMATION[@TYPE="Custom Transformation" and @TEMPLATENAME="Union Transformation"]
])
) {
$union_cnt++;
my $name = $trans->getAttribute('NAME');
my $parent = $trans->parentNode();
print "X: Fixing Union transformation $name\n";
my @output;
for my $field (
$trans->findnodes(q[
TRANSFORMFIELD[@GROUP="OUTPUT"]/@NAME
])
) {
push @output, $field->value();
}
my %dep;
my $dep_cnt;
for my $field (
$trans->findnodes(q[
TRANSFORMFIELD[@PORTTYPE="INPUT"]
])
) {
my $name = $field->getAttribute('NAME');
my $group = $field->getAttribute('GROUP');
my $dep_group = $output[$dep{$group}++];
my $new = $trans->addNewChild( '', 'FIELDDEPENDENCY' );
$new->setAttribute( 'INPUTFIELD', $name );
$new->setAttribute( 'OUTPUTFIELD', $dep_group );
}
}
$_->unbindNode() for $root->findnodes('//text()');
Warning: this code is not endorsed or guaranteed by anyone for anything!
The removing of all text nodes was so that the result would stay pretty-printed after output (is there a better/easier way?):
eval { $doc->toFile($file, 1) } or die "Could not write to $file: $@";
And there are no text nodes with anything but whitespace anyway. XML::LibXML seemed hard to use at first, but once you get used to how the docs are arranged (and learn some XPath), it's quite easy.