PowerCenter XML

runrig on 2008-09-28T17:18:01

Everyone I talk with says that mucking with the XML exported by Informatica's PowerCenter is not recommended, or that they tried and failed...so I had no choice but to ignore that advice when we were upgrading from version 7 to 8. I think the advice must be from people who don't quite know what they're doing, and/or "parse" with grep/awk/sed.

I've been messing with more XML in the last few weeks than I have in the last few years. I have my own (shared) opinion about Informatica/ETL (and I've taken up Aristotle's call to action), but at least it provides an opportunity to practice some XML-fu. Most transformations were simple changing of some attributes, but there was one issue where after importing into v8, if you delete a group from a Union transformation, the GUI crashes. So I created a Union transformation from scratch, exported it, and compared it to what I was importing, and hey, there was some stuff missing! So I wrote the following:

  # Fix Union transformations
  my $union_cnt;
  for my $trans (
    $root->findnodes(q[
      //TRANSFORMATION[@TYPE="Custom Transformation" and @TEMPLATENAME="Union Transformation"]
    ])
  ) {
    $union_cnt++;
    my $name = $trans->getAttribute('NAME');
    my $parent = $trans->parentNode();
    print "X: Fixing Union transformation $name\n";
    my @output;
    for my $field (
      $trans->findnodes(q[
        TRANSFORMFIELD[@GROUP="OUTPUT"]/@NAME
      ])
    ) {
      push @output, $field->value();
    }
    my %dep;
    my $dep_cnt;
    for my $field (
      $trans->findnodes(q[
        TRANSFORMFIELD[@PORTTYPE="INPUT"]
      ])
    ) {
      my $name  = $field->getAttribute('NAME');
      my $group = $field->getAttribute('GROUP');
      my $dep_group = $output[$dep{$group}++];
      my $new = $trans->addNewChild( '', 'FIELDDEPENDENCY' );
      $new->setAttribute( 'INPUTFIELD', $name );
      $new->setAttribute( 'OUTPUTFIELD', $dep_group );
    }
  }
  $_->unbindNode() for $root->findnodes('//text()');

Warning: this code is not endorsed or guaranteed by anyone for anything!

The removing of all text nodes was so that the result would stay pretty-printed after output (is there a better/easier way?):

  eval { $doc->toFile($file, 1) } or die "Could not write to $file: $@";

And there are no text nodes with anything but whitespace anyway. XML::LibXML seemed hard to use at first, but once you get used to how the docs are arranged (and learn some XPath), it's quite easy.