Expanding products of sums

tsee on 2009-06-05T17:18:51

A few years ago, when I started studying physics, I wrote a set of modules for representing and dealing with algebraic expressions in Perl: Math::Symbolic. It's not a beauty, but it can be quite useful.

Occasionally, I'm getting mail from people who want it to perform the tasks of a full computer algebra system such as Mathematica. The short answer is: It's not even close to such a thing, it never will be, and was, in fact, never meant to be. One of the most frequent questions I get is a variation of:

"How can I expand this product of sums into a sum of products using Math::Symbolic?"

Here again, the answer is it can't do that out of the box. But since I've been asked so many times, I wrote two implementations of that which you'll find below. This is to prevent anyone asking me ever again :)

The first implementation is really simple and I'd almost call it elegant. It is, however, also quite slow.

use strict;
use warnings;
use Math::Symbolic qw/:all/;
use Math::Symbolic::Custom::Transformation qw/:all/;

my $function = parse_from_string(<<'HERE');
(a + b)*(d + e + f)
HERE
#(b + c + d + e + f)*(a + b)*(d + e + f)*(a + b + c + d)*(a + b + c + d + e) 

print "Before: $function\n";

my $pattern = Math::Symbolic::Custom::Pattern->new(
  parse_from_string('(TREE_x+TREE_y) * TREE_z'),
  commutation => 1,
);

my $expand = new_trafo(
   $pattern => 'TREE_x*TREE_z + TREE_y*TREE_z',
);

while (1) {
  my $result = $expand->apply_recursive($function);
  last if not defined $result;
  $function = $result;
}

print "After: $function\n";

It uses the Math::Symbolic syntax itself to define the logic. Most of the work is actually done by the pattern matching and transformation modules Math::Symbolic::Custom::Pattern and Math::Symbolic::Custom::Transformation. The Pattern class defines search rules that can be matched against the expression's tree. The Transformation specifies rules to replace it with. Kind of like regular expressions. Just not as good (or fast).

The second implementation is likely much more useful and certainly a lot faster (but not optimized). It implements almost all of the logic manually and is based somewhat on Mark Jason Dominus wonderful iterator from Higher Order Perl. (Go, buy the book if you haven't. It's an utterly enjoyable read.)

use strict;
use warnings;
use Math::Symbolic qw/:all/;

my $function = parse_from_string(<<'HERE');
(a + b)*(d + e + f)
HERE
#(b + c + d + e + f)*(a + b)*(d + e + f)*(a + b + c + d)*(a + b + c + d + e) 

# First, split the product into sums
my @sums = split_formula( B_PRODUCT, $function );

#print "$_\n" foreach @sums;

# Split each sum into its sub-terms
my @terms = map {
  [ split_formula( B_SUM, $_ ) ]
} @sums;

my $n_terms = 1;
$n_terms *= @$_ for @terms;
print "Calculating all $n_terms terms...\n";
print "@$_\n" foreach @terms;

# This calculates the full formula in memory and stores it in $function
# $function = multiply(\@terms);
# print $function, "\n";

# This calculates each term and then prints it to STDOUT, but doesn't
# store it because memory is scarce
multiply_print(\@terms);



# We have to keep in mind that the formula is really a tree.
sub split_formula {
  my $optype = shift;

  my @formulas = @_;
  
  my @split;
  while (@formulas) {
    my $f = shift @formulas;
    if ($f->term_type == T_OPERATOR and $f->type == $optype) {
      push @formulas, @{ $f->{operands} };
    }
    else {
      push @split, $f;
    }
  }

  return @split;
}


# all of the following is based on the iterator
# pattern of Mark Jason Dominus' "Higher Order Perl", p. 128ff
sub multiply {
  my $terms = shift;
  my ($max, $count) = make_pattern($terms);

  my $func = make_product($terms, $count);
  return $func unless increment($max, $count);

  while (1) {
    my $prod = make_product($terms, $count);
    $func += $prod;
    last unless increment($max, $count);
  }

  return $func;
}

sub multiply_print {
  my $terms = shift;

  my $iter = make_term_iterator($terms);

  my $first = 1;
  while (1) {
    my $prod = $iter->();
    last if not defined $prod;
    if ($first) {
      $first = 0;
      print $prod;
    } else {
      print " + " . $prod;
    }
  }

  print "\n";
  return;
}

sub make_term_iterator {
  my $terms = shift;
  my ($max, $count) = make_pattern($terms);

  my $empty = 0;
  return sub {
    return() if $empty;
    my $func = make_product($terms, $count);
    $empty = !increment($max, $count);
    return $func;
  };
}

sub make_product {
  my $terms = shift;
  my $count = shift;

  # Note: One *could* save some CPU cycles by not cloning here (new).
  # BUT that may lead to fun debugging and interesting memory cycles
  # if you intend to actually use the tree.
  my $prod = $terms->[0][ $count->[0] ]->new;
  foreach my $i (1..$#$terms) {
    $prod *= $terms->[$i][ $count->[$i] ]->new;
  }
  return $prod;
}

sub increment {
  my $max = shift;
  my $count = shift;

  my $i = $#$count;
  while (1) {
    if ($count->[$i] < $max->[$i]) {
      $count->[$i]++;
      return 1;
    }
    else {
      $count->[$i] = 0;
      $i--;
    }
    if ($i < 0) {
      return();
    }
  }
}

sub make_pattern {
  my $terms = shift;
  my @max;
  my @pattern;
  foreach my $set (@$terms) {
    push @max, $#$set;
    push @pattern, 0;
  }
  return \@max, \@pattern;
}

I bet you can see why that second implementation doesn't give me as much of a warm, fuzzy feeling.

Cheers,
Steffen