I want two CPAN modules, which may or may not exist:
XML::Filter::SimpleAn XML::SAX handler that does what XML::Simple does. So if I have some XML like this:
baz bez
I would be able to do this:
my $foo = XML::Filter::Simple(); my $parser = XML::SAX::ParserFactory->parser( Handler => $foo ); $parser->parse_string( $xml ); ## same kind of data structure as XML::Simple print $foo->{ a }; # prints 1 print $foo->{ bar }[0]; # prints baz print $foo->{ bar }[1]; # prints barPolitics::US
I want to be able to keep up to date with the goings on of my congressman, senators, and I want Perl to help me.
use Politics::US::Senator; use Politics::US::Bill; my $senator = Politics::US::Senator->new( 'Durbin, Richard' ); foreach my $vote ( $senator->votes() ) { my $decision = $vote->decision(); my $billNum = $vote->billNumber(); my $bill = US::Politics::Bill->new( $billNumber); print "Today Durbin cast a vote of $decision regarding $billNum\n", "The bill's title is: ", $bill->title(), "\n" "And here is the content of the bill:\n", $bill->content(); }
Between the Senate website, and Thomas and WWW::Mechanize this isn't so far fetched at all.
XML::Simple can pretty much do that already. The difference is that in your example you ignore the return value from the parse and the hashref (or 'simple tree') ends up inside the handler object itself. But this is how you'd do it with XML::Simple:
my $xs = XML::Simple->new(keyattr => {},...etc...);
my $parser = XML::SAX::ParserFactory->parser( Handler => $xs );
my $foo = $parser->parse_string( $xml );
print $foo->{ a }; # prints 1
print $foo->{ bar }[0]; # prints baz
print $foo->{ bar }[1]; # prints bar
You'll need version 2.x of XML::Simple, but that's all you'll need.
On a related note, I've had some thoughts about building a module called XML::Filter::Simple which would allow you to easily build a SAX filter for common operations such as deleting or renaming elements. It's all vapourware at the moment, but something might come of it one day.
Re:No need for XML::Filter::Simple
inkdroid on 2003-06-15T11:21:29
Woah, great! I guess I had an old version of XML::Simple lying around.Re:No need for XML::Filter::Simple
inkdroid on 2003-06-15T11:39:55
I would really like to have access to the 'simple tree' while I am filtering. The reason for this is that I am working with huge XML documents (of varying content), and I would like to be able to extract portions of it as the parse is proceeding.
I imagine a Data::Dumper::Dumper call on the blessed object will reveal where the simple tree is being built up, so I can be bad and dig into the object myself. But I'd rather not do that. Any ideas?Re:No need for XML::Filter::Simple
grantm on 2003-06-15T21:46:12
You might want to look into XML::Filter::Dispatcher which I think can do what you want (ie: treat sections of the document as documents in their own right).
Digging inside XML::Simple.pm wouldn't actually help (and might harm your sanity) since all the handler does is accumulate the events in an XML::Parser Tree-style structure and then once the whole document has been parsed it uses the 'collaspe' method to convert it to a simple tree.
Re:No need for XML::Filter::Simple
mir on 2003-07-15T12:43:21
A little late... you can do this using XML::Twig: the latest version lets you use the simplify method on any element of the tree. simplify gives you the same structure as XML::Simple. And of course you can use it during the parsing, so you can deal with parts of the tree.
Example:
#!/usr/bin/perl -w
use strict;
use XML::Twig;
use YAML;
XML::Twig->new( twig_roots => { foo => \&foo })
->parse( \*DATA);
sub foo
{ my( $t, $foo)= @_;
my $data= $foo->simplify( forcearray => [ 'bar' ]);
print Dump $data;
$t->purge; # to release the memory
}
# note that the extra ; seem to be added by the slash code
__DATA__
<doc>
<foo a="1"><bar>baz1</bar><bar>bez2</bar></foo>
<foo a="2"><bar>baz3</bar><bar>bez4</bar><bar>bez5</bar></foo>
<foo a="3"><bar>bez6</bar></foo>
</doc>
Re: Politics::US
inkdroid on 2003-08-01T14:03:53
Neat. Let me know if you hear anything. In the meantime I'm serious thinking of writing an interface to Thomas using a nice OO interface and WWW::Mechanize in the background. Interested? I'd like to come up with the API first, and work from there.Re: Politics::US
chaoticset on 2003-08-04T13:46:10
Thomas looks neat (and very tentacle-y, in the amount of information it provides...), but I don't know doodly about WWW::Mechanize yet.I'd definately be interested to hear about it -- I personally try to be a political agnostic, but it doesn't always work.