Improvement Needed - Perl needs better benchmarking

Alias on 2009-09-07T16:41:36

My previous post on Moose contained a fairly typical benchmark, used in a fairly typical way.



# Object::Tiny vs Object::Tiny::XS vs Moose::Tiny # Note that I had to do __PACKAGE__->meta->make_immutable to # the Moose::Tiny class, because it doesn't to so itself.

# Based on the following Object::Tiny class. use Object::Tiny qw{ one two three four five };

D:\DOCUME~1\ADAM~1.KEN\Desktop>perl moosemark.pl Rate moose_all xs_all tiny_all moose_none tiny_none xs_none moose_all 82795/s -- -54% -54% -72% -90% -92% xs_all 178763/s 116% -- -1% -39% -79% -83% tiny_all 179759/s 117% 1% -- -39% -79% -83% moose_none 294985/s 256% 65% 64% -- -65% -72% tiny_none 842460/s 918% 371% 369% 186% -- -21% xs_none 1066098/s 1188% 496% 493% 261% 27% -- (warning: too few iterations for a reliable count) Rate moose_get tiny_get xs_get moose_get 1730104/s -- -11% -32% tiny_get 1937984/s 12% -- -24% xs_get 2557545/s 48% 32% --



Now, I'm not showing this again to rub in that Moose is slower than Class::XSAccessor, because even as recently as today there XS acceleration code about to land and start making up the difference.

The problem here is that this benchmark isn't tangible.

You can't reuse it. You can't identify it. You can't package it or upload it to the CPAN. The closest you can get is to just randomly drop the raw script somewhere and hope for the best.

And god forbid you want a benchmark that will try to establish theta or O(), a benchmark that run itself across a set or a range of n, and produce a chart, or auto-discover O(). Or vary over multiple variables.

A benchmark with encoded results, in the same way that TAP is encoding of test results. A benchmark I can save in a /bm/ directory, the same way that I can save my tests in a /t/ directory.

So where is it? Where is my Class::Benchmark? My Benchmark::Archive that uploads to my Benchmark::Smolder?

Dear LazyWeb.

Can you make us a benchmark stack that is as good as our testing stack?

Please...?


I have the answer

Burak on 2009-09-07T19:19:08

42!

benchmarks are tests

daxim on 2009-09-07T20:27:31

You can't package it or upload it to the CPAN.

http://search.cpan.org/dist/App-Benchmark-Accessors/

http://hanekomu.at/blog/tags/benchmarks

Re:benchmarks are tests

Alias on 2009-09-08T01:06:24

The former embeds the benchmarks into the test suite, they aren't installed.

DDT & Benchmarking

Hercynium on 2009-09-08T02:27:44

I wonder if your post today was inspired by what I wrote in reply to your last post or we're just on the same wavelength with this. :)

At YAPC::NA this year I did a lightning talk on a web-testing framework I built for a client*. The main topic was just that the tests were all data-driven, and because of that I gained a pile of great functionality, not least of which was the ability to turn my test suite into a benchmarking suite seamlessly.

By running via prove or directly, each script was a set of unit tests. By running under a utility I called "bench", those same test scripts became benchmarks (unless marked as not suitable in the test data structure)

*Funny story - schwern told me "Congratulations, you've just re-invented Selenium!" Lo and behold I looked at Selenium and showed it to the client and they are *delighted* that I threw away a month's work! ;-)

Re:DDT & Benchmarking

Alias on 2009-09-08T03:22:03

It's mostly because of some issues at work.

We've got ourselves a shiny new load testing environment, which is useful for catching major performance regressions.

But it doesn't easily catch slowdowns in strange and unusual places. Stuff beyond just the "scrape the website" tests, things that only run in the backend and do a lot of heavy lifting.

What we'd like is a way to have a benchmark suite that sits in the project, runs nightly with the smoke tests, and logs to a database so that we can do performance trend-spotting on specific functions or processes.

Web performance testing is important too, but more complex. The one we need the most is a way to test more tightly than that.

Re:DDT & Benchmarking

tgape on 2009-09-09T02:49:39

I think Alias addressed most of my would-be response, so I'll limit my comments to:

All of the efforts I've personally seen to turn unit tests into benchmarks ended up being more of a benchmark of perl's compiler than anything else.  Unit tests tend to focus on tiny portions of the overall code, and test just them; they tend to not do unnecessary looping.  As such, the naive 'turn them into benchmarks' puts the loop around them, and unfortunately either re-evals them, or re-forks and evals them.

One would be better off turning ones functional tests into benchmarks - on average (at least for me) they test a larger portion of the code per script, but tend to not be as complex, proportionately speaking.  Some of my functional tests have even been distilled down to the point of "Directory A contains a bunch of input files, directory B contains files with the same names which are their corresponding output files.  See if they generate the right output."  When that's running against a persistent process, it can be looped without too much worry.

Of course, the real benchmark is probably more like, "Directory A contains a bunch of input files; each one corresponds to a real session with its data munged to eliminate personal details.  Each session file resets its state at the end.  Directory B contains the expected output for each of these sessions.  Run them all at the same time, repeatedly."

Re:DDT & Benchmarking

Hercynium on 2009-09-09T03:10:31

Indeed, in this case the test harness *is* testing on a more functional/integration level. As we refactor the code that runs the client's site, we want to ensure that we neither break any part of the site nor slow it down.

Still, using this type of technique has made me wonder if there is a good way of encouraging benchmarks of perl modules by piggy-backing on the tests.

An example of this can be found in the test suite for Sort::Maker. http://search.cpan.org/dist/Sort-Maker/ (It was actually Uri's advice to look at that module's tests that inspired the design of my website DDT code)

I do understand that benchmarking typically needs much more "context" than unit tests, but a test harness with an API that enables/encourages 'performance comparisons' with other code that does the same thing could be a good springboard to get people providing benchmarkable code/tests in their CPAN dists.

Re:DDT & Benchmarking

Alias on 2009-09-09T04:41:21

One of the other problems we have is that because our system is enormous, with many links to external systems and pre-compiled caches, it takes two to three orders of magnitude more time to set up for a functional test than it does to run the actual tests.

I think any sufficiently robust solution is going to need a way to factor that out, and this is a problem for the test script overloading approach.

The test scripts would need a way to isolate the code you are interested in from the code you are running to set up for the actual test code.

Spec's welcome

notbenh on 2009-09-08T19:29:29

a few members of pdx.pm (including me) are working on a benchmarking framework with the intent of being able to build nice charts and graphs to show rakudo getting faster and to compare it to perl5. We're using the euler problem set just as an example for now but if you have specific specs that you would like us to hit I would love to hear them.

http://github.com/notbenh/euler_bench/tree/master

Re:Spec's welcome

Alias on 2009-09-09T03:22:38

Apart from a 120 line .pm file with very little in it, I'm not seeing much there in the way of frameworky'ness.

Do you have a more accurate URL?