Web::Scraper (HTML::TreeBuilder::XPath) slowdown on Fedora

miyagawa on 2007-11-25T23:37:13

Today I had an interesting report from Web::Scraper user, saying that he has a script that runs really quick (less than 1 sec) on Macbook but so slow (50 secs) on AMD dual CPU machine. Here's the dprof report:

Total Elapsed Time = 47.32165 Seconds User+System Time = 31.07165 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 51.6 16.03 16.033 6922 0.0023 0.0023 XML::XPathEngine::NodeSet::new 13.5 4.208 4.208 1777 0.0024 0.0024 XML::XPathEngine::Boolean::True 13.0 4.048 4.048 1723 0.0023 0.0023 XML::XPathEngine::Literal::new 11.3 3.518 3.518 1666 0.0021 0.0021 XML::XPathEngine::Boolean::False

We initially thought it's due to some XS module library issues with dual CPU, but it turned out he was using perl that comes with Fedora, and the rpm version he uses is 5.8.8-10.

As addressed in RH/Fedora bugzilla, perl 5.8.8 rpm prior to 5.8.8-22 has a nasty patch that makes all perl's new() (or bless) call in classes with overloaded methods really slow. HTML::TreeBuilder::XPath (hence Web::Scraper) creates a lot of Nodes on HTML pages and XML::XPathEngine::NodeSet definitely has an overloaded function.

So this is really due to Fedora Perl's patch. If you run into the same issue with Fedora, check your rpm version and upgrade to the latest, or build your own perl which is always a good thing.