wants

inkdroid on 2003-06-14T21:07:00

I want two CPAN modules, which may or may not exist:

XML::Filter::Simple

An XML::SAX handler that does what XML::Simple does. So if I have some XML like this:

    
	baz
	bez
    

I would be able to do this:

    my $foo = XML::Filter::Simple();
    my $parser = XML::SAX::ParserFactory->parser( Handler => $foo );
    $parser->parse_string( $xml );

    ## same kind of data structure as XML::Simple
    print $foo->{ a };		# prints 1
    print $foo->{ bar }[0];	# prints baz
    print $foo->{ bar }[1];	# prints bar
Politics::US

I want to be able to keep up to date with the goings on of my congressman, senators, and I want Perl to help me.


use Politics::US::Senator;
use Politics::US::Bill;

my $senator = Politics::US::Senator->new( 'Durbin, Richard' );
foreach my $vote ( $senator->votes() ) {

     my $decision = $vote->decision();
     my $billNum = $vote->billNumber();
     my $bill = US::Politics::Bill->new( $billNumber);

     print 
          "Today Durbin cast a vote of $decision regarding $billNum\n",
          "The bill's title is: ", $bill->title(), "\n"
          "And here is the content of the bill:\n",
          $bill->content();
}

Between the Senate website, and Thomas and WWW::Mechanize this isn't so far fetched at all.


No need for XML::Filter::Simple

grantm on 2003-06-15T02:24:40

XML::Simple can pretty much do that already. The difference is that in your example you ignore the return value from the parse and the hashref (or 'simple tree') ends up inside the handler object itself. But this is how you'd do it with XML::Simple:

    my $xs = XML::Simple->new(keyattr => {}, ...etc...);
    my $parser = XML::SAX::ParserFactory->parser( Handler => $xs );
    my $foo = $parser->parse_string( $xml );

    print $foo->{ a };        # prints 1
    print $foo->{ bar }[0];    # prints baz
    print $foo->{ bar }[1];    # prints bar

You'll need version 2.x of XML::Simple, but that's all you'll need.

On a related note, I've had some thoughts about building a module called XML::Filter::Simple which would allow you to easily build a SAX filter for common operations such as deleting or renaming elements. It's all vapourware at the moment, but something might come of it one day.

Re:No need for XML::Filter::Simple

inkdroid on 2003-06-15T11:21:29

Woah, great! I guess I had an old version of XML::Simple lying around.

Re:No need for XML::Filter::Simple

inkdroid on 2003-06-15T11:39:55

I would really like to have access to the 'simple tree' while I am filtering. The reason for this is that I am working with huge XML documents (of varying content), and I would like to be able to extract portions of it as the parse is proceeding.

I imagine a Data::Dumper::Dumper call on the blessed object will reveal where the simple tree is being built up, so I can be bad and dig into the object myself. But I'd rather not do that. Any ideas?

Re:No need for XML::Filter::Simple

grantm on 2003-06-15T21:46:12

You might want to look into XML::Filter::Dispatcher which I think can do what you want (ie: treat sections of the document as documents in their own right).

Digging inside XML::Simple.pm wouldn't actually help (and might harm your sanity) since all the handler does is accumulate the events in an XML::Parser Tree-style structure and then once the whole document has been parsed it uses the 'collaspe' method to convert it to a simple tree.

Re:No need for XML::Filter::Simple

mir on 2003-07-15T12:43:21

A little late... you can do this using XML::Twig: the latest version lets you use the simplify method on any element of the tree. simplify gives you the same structure as XML::Simple. And of course you can use it during the parsing, so you can deal with parts of the tree.

Example:

#!/usr/bin/perl -w
use strict;
use XML::Twig;
use YAML;

XML::Twig->new( twig_roots => { foo => \&foo })
         ->parse( \*DATA);

sub foo
  { my( $t, $foo)= @_;
    my $data= $foo->simplify( forcearray => [ 'bar' ]);
    print Dump $data;
    $t->purge; # to release the memory
  }

# note that the extra ; seem to be added by the slash code
__DATA__
<doc>
  <foo a="1"><bar>baz1</bar><bar>bez2</bar></foo>
  <foo a="2"><bar>baz3</bar><bar>bez4</bar><bar>bez5</bar></foo>
  <foo a="3"><bar>bez6</bar></foo>
</doc>

Re: Politics::US

chaoticset on 2003-08-01T03:26:34

Well, I emailed the people running GIA about whether they had an API, but they haven't written me back. I'm sure that if they do then a module encapsulating the easiest uses of it can't be far behind, and if there isn't an API, I can't see why there wouldn't be one soon.

Re: Politics::US

inkdroid on 2003-08-01T14:03:53

Neat. Let me know if you hear anything. In the meantime I'm serious thinking of writing an interface to Thomas using a nice OO interface and WWW::Mechanize in the background. Interested? I'd like to come up with the API first, and work from there.

Re: Politics::US

chaoticset on 2003-08-04T13:46:10

Thomas looks neat (and very tentacle-y, in the amount of information it provides...), but I don't know doodly about WWW::Mechanize yet.

I'd definately be interested to hear about it -- I personally try to be a political agnostic, but it doesn't always work.