CPAN Testers Stats - May Update - The Great Wide Open

barbie on 2008-06-03T11:09:33

CPAN Testers Statistics

Once again Andreas took a stab at gaining on Chris, and for the second time in recent times, managed to submit the most reports for this month. It's impressive to see that all the major testers are running multiple test environments, to help cover multiple platform/perl combinations. There are roughly 1000 basic setups possible, with many more variations with libraries and compilers, etc, so although we're certainly making quite an impact, we're still a long way from covering the whole matrix. So if you do have a spare box, please think whether it could help with CPAN Testers coverage.

Once again the testers have broken several previous records. Most reports topped 176793 for last month, with all the variations, PASS, FAIL, NA and UNKNOWN all seeing their highest counts since CPAN Testers records began. It would have been nice to see the FAIL and UNKNOWN counts not increase, but they're still fairly consistent at 13% of the total. Andreas raised the barrier a little higher, having submitted 54764 reports last month. So far he's the only tester to have broken the 50,000 barrier ... twice. However, I suspect that some testers will be pushing 100,000 reports by the end of the year.

This month we've had 14 further addresses added to the list, which includes 7 new testers, giving us the second highest number of testers, 117, in a month. One of the new testers, Jon Allen, you might also know as the guy behind the facelift of perldoc.perl.org, as well as being another Birmingham.pm'er. Jon happened to notice that there weren't a lot of Mac reports, so offered to give the CPAN Testers ago. He wrote up his experience of installing and configuring CPAN::Reporter::Smoker, which you can read on the CPAN Testers Wiki. If you're interested in setting up a similar automated testing bot, it's a good howto. In the coming months I shall endeavour to write a similar one for CPAN::YACSmoke.

Notice any significant difference in the stats site these days? For those that haven't, the site is now updated some time after 2am (Central European Time) every day. After regularly tinkering with the scripts behind the scenes to make the database more reliable on a daily basis, has now meant I can make the switch to running a site refresh on a daily basis too. With that done, I'm now looking at automating the Bad Upload and Bad Report emails that I send out every month. This will hopefully help authors to quickly spot when they've uploaded a badly formatted distribution, rather having to wait for weeks to be alerted, and for testers to quickly block testing certain distributions by updating the configuration. In the longer term I'd also like to have a report parser that can quickly spot where a testbot has gone rogue, thus alerting the tester to investigate quickly. This will then hopefully give authors a bit more confidence in the system, so they don't ignore reports out of hand.

A further addition to the site came last week, with the "Find A Tester" feature. As several authors are now reading the web interface for the NNTP server, they are finding it difficult to figure out who the tester is for their distributions, particularly when the reports don't get mailed to them. As such I've set up a simple script behind the scenes that does the correct lookup. Hopefully by requesting the NNTP ID of the report, this will avoid me having to put restrictive spam measures on the script. If you intend to use the form, please use it wisely.


100k barrier

dagolden on 2008-06-03T11:57:52

However, I suspect that some testers will be pushing 100,000 reports by the end of the year.

FWIW, I'm aiming to break 100,000 reports by the end of June.

It's not about adding boxes, it's about making the testing process more efficient. If all goes well, I'll write it up as an article somewhere.

-- dagolden

p.s. thanks for the daily stats update -- it's fun to see my numbers climb

Re:100k barrier

barbie on 2008-06-03T12:59:30

It's not about adding boxes

Agreed. In the majority of cases those submitting reports are only testing against 1 or 2 versions of perl. I haven't looked at the distribution of platforms/perls with regards to testers, but that's something I'm going to start looking into. The prolific testers tend to have 5+ versions on each platform.

Once I get the alert system working on a frequent basis, it should also help quickly warn testers that rogue results are being reported and allow them to adjust their testing environment accordingly. This will then give everyone a bit of confidence that despite the sheer volume of reports, nearly all the reports are valid. With the HTTP server in place, it might then be a little easier to delete bogus reports.

I think this year is going to be interesting to say the least :)

The graphs break now...

Alias on 2008-06-04T03:46:03

The problem with updating nightly is that the graphs go all broken...

You probably need to do some forward estimating on the test counts...

So after 2 days of the month, inflate that to estimate the total for the fill 31 days.