Another SQL pipeline

runrig on 2007-04-26T17:59:52

We have a program that generates a report, and I was supposed to just change the servers that the data is coming from. But I couldn't even test the current version of the report, because the main SQL statement causes the temp db space to fill up, and so the program dies.

So I started tearing apart the SQL statement, finding non-obvious cartesian joins (there was a 'distinct' qualifier on the select list, which was supposed to have made it ok, but actually just buried the problem), and missing joins, which was made difficult since this is a third party database (Informatica), and nobody fully knows the schema (and I have been unable to google any help). Once I had separated sql statements, I didn't want to put it back together, I just wanted to run a big nested foreach loop, but without having a big ugly nested foreach loop. And without putting as much work into it as I put into my previous pipeline.

So I came up with a sort of lightweight semi-flexible iterator based functional system like so:

sub get_customers {
  mk_sql_iterator(
    SQL => 'select cust_id, cust_name from customers',
    SELECT => [qw(CUST_ID CUSTOMER_NAME)],
    @_,
  );
}

sub get_orders {
  mk_sql_iterator(
    SQL => 'select order_no, order_amt from orders where cust_id = ?'
    SELECT => [qw(ORDER_NO TOTAL_AMT)],
    ARGS => ['CUST_ID'],
    @_,
  );
}

my $f = mk_iterator(
  $dbh,
  \&get_customers,
  \&get_orders,
);

while (my $order = $f->()) {
  # $order is hash ref with keys CUST_ID, CUSTOMER_NAME,
  # ORDER_NO, and TOTAL_AMT
}

Now it's easy to inject other selects and filters into the pipeline, make the iterator generators take arguments to modify or override the default behaviour, build up and test the pipeline as I go, etc., and I'm happy for now :-)