After a lot of aggressive work improving our test suite performance, adding 50% more tests has tremendously slowed us down.
All tests successful. Files=48, Tests=14025, 1386 wallclock secs ( 3.06 usr 0.46 sys + 1035.71 cusr 42.77 csys = 1082.00 CPU) Result: PASS Number of test programs: 48 Total runtime approximately 23 minutes 5 seconds Ten slowest tests: +---------+-----------------------------------+ | Time | Test | +---------+-----------------------------------+ | 13m 25s | t/acceptance.t | | 5m 36s | t/aggregate.t | | 0m 36s | t/standards/use.t | | 0m 24s | t/system/both/import/log/pager.t | | 0m 20s | t/system/both/import/log/search.t | | 0m 11s | t/system/both/import/log/log.t | | 0m 11s | t/unit/db/migrations.t | | 0m 11s | t/unit/fip/record-changes.t | | 0m 10s | t/unit/piptest/pprove/testdb.t | | 0m 9s | t/test_class_tests.t | +---------+-----------------------------------+
(Note that the acceptance and aggregate tests both emcompass the vast majority of those 14000 tests and simple aggregation is no longer a win.)
Since we're running at roughly 10 tests a second, I started paying attention to where we can gain wins and it looks like XML::XPath is a likely candidate for performance improvements. In fact, the code has a number of areas where it could be improved substantially. It's not been updated in years and I've contacted Matt Sergeant about having a colleague take it over. This could also be an excellent opportunity to pay attention to its RT queue.
$ dprofpp Total Elapsed Time = 35.55858 Seconds User+System Time = 13.44858 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 3.32 0.446 0.655 39692 0.0000 0.0000:/home/poec01/trunk/deps/lib /perl5//XML/XPath/XMLParser.pm:63 2.94 0.395 1.373 5544 0.0001 0.0002 XML::XPath::XMLParser::parse_start 2.35 0.316 0.385 20180 0.0000 0.0000 :/home/poec01/trunk/deps/lib /perl5//XML/XPath/Node.pm:236 2.04 0.275 3.025 30768 0.0000 0.0001 :/opt/csw/lib/perl/site_perl /XML/Parser.pm:187 1.84 0.248 1.229 19680 0.0000 0.0001 XML::XPath::XMLParser::parse_char 1.84 0.247 0.247 48278 0.0000 0.0000 Class::Accessor::Grouped::get_simp le 1.83 0.246 0.598 20196 0.0000 0.0000 :/home/poec01/trunk/deps_pat ched/lib/Test/XML/XPath.pm:67 1.80 0.242 0.337 19252 0.0000 0.0000 Class::Accessor::Grouped::get_inhe rited 1.50 0.202 0.202 11760 0.0000 0.0000 XML::XPath::XMLParser::_namespace 1.42 0.191 1.113 11136 0.0000 0.0001 XML::XPath::Node::ElementImpl::DES TROY 1.32 0.178 0.178 11200 0.0000 0.0000 XML::XPath::Node::ElementImpl::get Attributes 1.28 0.172 0.418 1361 0.0001 0.0003 DBIx::Class::ResultSet::new 1.23 0.166 0.869 549 0.0003 0.0016 base::import 1.22 0.164 0.362 13872 0.0000 0.0000 XML::XPath::Node::ElementImpl::app endChild 1.20 0.162 3.187 24 0.0068 0.1328 XML::Parser::Expat::ParseString
Just that list above shows that almost 16% of our time is spent in XML::XPath.
Is there any reason why you could not switch from XML::XPath to XML::LibXML? The code should be portable, except for a few constants that need to be renamed, I think.
In any case, if you have spotted improvements that can be made to the XPath engine of XML::XPath (as opposed to its DOM implementation) I would be interested to hear about it for XML::XPathEngine, which is a fork of that engine.
Thanks
Re:Good luck with that
Ovid on 2008-05-14T16:26:08
A quick switch of using XML::LibXML for Test::XML::XPath doubled the test time for one of our tests. Of course, that was a single test run and there could have been other issues involved.
The major issues I saw for XML::XPath were getting rid of a lot of "shift" statements in favor of list assignments directly from @_. There is at least one wrapper function which should be an alias and a rather curious looking AUTOLOAD. Also, some of the C-style for loops can be worked on.
Other than that, I haven't looked that closely.
Making XML::LibXML faster
arc on 2008-05-20T13:46:31
[Sorry about the tardiness of this reply.]
If
XML::LibXML
takes a lot longer thanXML::XPath
, there’s probably something bad going on.In this case, I’d guess that the “something bad” is that
XML::LibXML
(or rather, the underlyinglibxml2
library) is loading external DTDs over the network. I can parse an 18-megabyte XML instance from a file in a little under a second on my laptop, once I turn off DTD loading.Turning off the ”run very slowly” flag can be done like this:
my $parser = XML::LibXML->new;
$parser->load_ext_dtd(0);
my $doc = $parser->parse_file($xml_filename);If you’re thinking that loading external DTDs from the network is an undesirable default, I’m inclined to agree with you.
[Credit where credit’s due: though I have seen and dealt with this issue elsewhere, I was reminded of it just now by this mailing-list posting by Aristotle Pagaltzis.]
Re:Making XML::LibXML faster
Ovid on 2008-05-20T14:11:50
You are correct that something was wrong. Turns out it was a completely unrelated error. Regrettably, we've discovered that we can't just switch over to XML::LibXML because we have two different namespaces and XML::XPath was letting us ignore this issue (this was in place long before I started). We have about 237 YAML documents driving our acceptance tests and they have multiple xpaths embedded which would presumably have to be changed to deal with this
:/ Re:Making XML::LibXML faster
Aristotle on 2008-05-20T14:39:31
In the interest of correctness it should be pointed out that
XML::XPath
is in violation of the XPath spec here, and that matching namespaced elements in XPath always requires binding a prefix to the namespace.Re:Making XML::LibXML faster
Ovid on 2008-05-20T17:52:07
We are aware of this. We're just in a bad position with some legacy code and since it hadn't been addressed previously, we need to figure out the easiest way to move forward.