Time to Speed Up The Tests Again

Ovid on 2008-05-14T10:58:56

After a lot of aggressive work improving our test suite performance, adding 50% more tests has tremendously slowed us down.

All tests successful.
Files=48, Tests=14025, 1386 wallclock secs ( 3.06 usr  0.46 sys + 1035.71 cusr 42.77 csys = 1082.00 CPU)
Result: PASS
Number of test programs: 48
Total runtime approximately 23 minutes 5 seconds

Ten slowest tests:
+---------+-----------------------------------+
| Time    | Test                              |
+---------+-----------------------------------+
| 13m 25s | t/acceptance.t                    |
| 5m 36s  | t/aggregate.t                     |
| 0m 36s  | t/standards/use.t                 |
| 0m 24s  | t/system/both/import/log/pager.t  |
| 0m 20s  | t/system/both/import/log/search.t |
| 0m 11s  | t/system/both/import/log/log.t    |
| 0m 11s  | t/unit/db/migrations.t            |
| 0m 11s  | t/unit/fip/record-changes.t       |
| 0m 10s  | t/unit/piptest/pprove/testdb.t    |
| 0m 9s   | t/test_class_tests.t              |
+---------+-----------------------------------+

(Note that the acceptance and aggregate tests both emcompass the vast majority of those 14000 tests and simple aggregation is no longer a win.)

Since we're running at roughly 10 tests a second, I started paying attention to where we can gain wins and it looks like XML::XPath is a likely candidate for performance improvements. In fact, the code has a number of areas where it could be improved substantially. It's not been updated in years and I've contacted Matt Sergeant about having a colleague take it over. This could also be an excellent opportunity to pay attention to its RT queue.

 $ dprofpp
Total Elapsed Time = 35.55858 Seconds
  User+System Time = 13.44858 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 3.32   0.446  0.655  39692   0.0000 0.0000  :/home/poec01/trunk/deps/lib
                                             /perl5//XML/XPath/XMLParser.pm:63
 2.94   0.395  1.373   5544   0.0001 0.0002  XML::XPath::XMLParser::parse_start
 2.35   0.316  0.385  20180   0.0000 0.0000  :/home/poec01/trunk/deps/lib
                                             /perl5//XML/XPath/Node.pm:236
 2.04   0.275  3.025  30768   0.0000 0.0001  :/opt/csw/lib/perl/site_perl
                                             /XML/Parser.pm:187
 1.84   0.248  1.229  19680   0.0000 0.0001  XML::XPath::XMLParser::parse_char
 1.84   0.247  0.247  48278   0.0000 0.0000  Class::Accessor::Grouped::get_simp
                                             le
 1.83   0.246  0.598  20196   0.0000 0.0000  :/home/poec01/trunk/deps_pat
                                             ched/lib/Test/XML/XPath.pm:67
 1.80   0.242  0.337  19252   0.0000 0.0000  Class::Accessor::Grouped::get_inhe
                                             rited
 1.50   0.202  0.202  11760   0.0000 0.0000  XML::XPath::XMLParser::_namespace
 1.42   0.191  1.113  11136   0.0000 0.0001  XML::XPath::Node::ElementImpl::DES
                                             TROY
 1.32   0.178  0.178  11200   0.0000 0.0000  XML::XPath::Node::ElementImpl::get
                                             Attributes
 1.28   0.172  0.418   1361   0.0001 0.0003  DBIx::Class::ResultSet::new
 1.23   0.166  0.869    549   0.0003 0.0016  base::import
 1.22   0.164  0.362  13872   0.0000 0.0000  XML::XPath::Node::ElementImpl::app
                                             endChild
 1.20   0.162  3.187     24   0.0068 0.1328  XML::Parser::Expat::ParseString

Just that list above shows that almost 16% of our time is spent in XML::XPath.


Good luck with that

mir on 2008-05-14T16:11:12

Is there any reason why you could not switch from XML::XPath to XML::LibXML? The code should be portable, except for a few constants that need to be renamed, I think.

In any case, if you have spotted improvements that can be made to the XPath engine of XML::XPath (as opposed to its DOM implementation) I would be interested to hear about it for XML::XPathEngine, which is a fork of that engine.

Thanks

Re:Good luck with that

Ovid on 2008-05-14T16:26:08

A quick switch of using XML::LibXML for Test::XML::XPath doubled the test time for one of our tests. Of course, that was a single test run and there could have been other issues involved.

The major issues I saw for XML::XPath were getting rid of a lot of "shift" statements in favor of list assignments directly from @_. There is at least one wrapper function which should be an alias and a rather curious looking AUTOLOAD. Also, some of the C-style for loops can be worked on.

Other than that, I haven't looked that closely.

Making XML::LibXML faster

arc on 2008-05-20T13:46:31

[Sorry about the tardiness of this reply.]

If XML::LibXML takes a lot longer than XML::XPath, there’s probably something bad going on.

In this case, I’d guess that the “something bad” is that XML::LibXML (or rather, the underlying libxml2 library) is loading external DTDs over the network. I can parse an 18-megabyte XML instance from a file in a little under a second on my laptop, once I turn off DTD loading.

Turning off the ”run very slowly” flag can be done like this:

my $parser = XML::LibXML->new;
$parser->load_ext_dtd(0);

my $doc = $parser->parse_file($xml_filename);

If you’re thinking that loading external DTDs from the network is an undesirable default, I’m inclined to agree with you.

[Credit where credit’s due: though I have seen and dealt with this issue elsewhere, I was reminded of it just now by this mailing-list posting by Aristotle Pagaltzis.]

Re:Making XML::LibXML faster

Ovid on 2008-05-20T14:11:50

You are correct that something was wrong. Turns out it was a completely unrelated error. Regrettably, we've discovered that we can't just switch over to XML::LibXML because we have two different namespaces and XML::XPath was letting us ignore this issue (this was in place long before I started). We have about 237 YAML documents driving our acceptance tests and they have multiple xpaths embedded which would presumably have to be changed to deal with this :/

Re:Making XML::LibXML faster

Aristotle on 2008-05-20T14:39:31

In the interest of correctness it should be pointed out that XML::XPath is in violation of the XPath spec here, and that matching namespaced elements in XPath always requires binding a prefix to the namespace.

Re:Making XML::LibXML faster

Ovid on 2008-05-20T17:52:07

We are aware of this. We're just in a bad position with some legacy code and since it hadn't been addressed previously, we need to figure out the easiest way to move forward.