CPAN Testers Summary - June 2009 - The Nylon Curtain

barbie on 2009-07-04T13:45:28

Cross posted from the CPAN Testers Blog.

June saw a lot of work behind the scenes for CPAN Testers. At the end of the month David and Ricardo finally got to release Metabase to CPAN, the project key to moving towards CPAN Testers 2.0. If you're interested in helping out or finding out more, join the mailing list, or take a look at the current Github repo. David has identified some of the areas still to be worked on, so if you have some tuits to help out, it would be very much appreciated.

The end of June also enjoyed the sun in Pittsburgh as part of YAPC::NA 2009, aka YAPC|10. While there were some testing related talks, there wasn't a specific CPAN Testers talk this year, or BOF. So much has been going into the work of getting the websites upgraded I never got the time to prepare a talk about it all. Next year hopefully we'll have a lot more to say about Metabase and the CPAN Testers 2.0 infrastructure. The talk I did do in Pittsburgh, The Statistics of CPAN, did however highlight some very positive numbers about the state of CPAN. If nothing else it highlights that CPAN Testers has a lot of work to continue with for a long time to come. I'm looking at putting a number of the tables and graphs into the CPAN Testers Statistics website, and if you have any suggestions for more, please let me know.

Following the changes in the CPAN Testers Reports website, the old domains now point to the static pages. Thanks to Ask, Robert and Jos for helping out with that. In doing so, a number of issues were pointed out that caused others problems. Specifically with the YAML files that are produced. Due to the vast number of reports now available, processing them is extremely time consuming. As a consequence to reduce the overhead, I ended up streamlining the data recorded in the YAML and JSON files, as several fields were either repeated or complete redundant. Unfortunately this has meant that some consumers of these files now are not able to process them correctly. As such there is now a new distribution on CPAN, CPAN-Testers-WWW-Reports-Parser, which can be used to correctly parse a CPAN Testers YAML or JSON file or data block, and return the fields you want. It supports all the fields previously used and knows how to construct them all from the current data set. If you plan on using the CPAN Testers data for a future project, please consider using this to ensure any future changes are instantly picked with a simple upgrade.

Last month we had a total of 165 testers submitting reports. The mappings this month included 34 total addresses mapped, of which 17 were for newly identified testers.

Congratulations to Dan Collins, who managed to post over 89,000 test reports in a single month, the highest we've ever had. Unsurprisingly Chris wasn't too far behind :) I was also delighted to meet up with George Greer at YAPC|10, as for those that weren't aware, George took the honour of the 4 millionth post to the CPAN Testers mailing list at the end of May. A few days later, on June 7th, Serguei Trouchelle posted the 4 millionth accepted test report. Hopefully I'll get to meet Serguei at some point too. On average we have previously being seeing just over 200,000 reports posted each month, however, June saw 358,107 reports posted, a staggering amount of effort from all the testers.

The next summary will hopefully be posted during YAPC::Europe 2009 in Lisbon. If you're a tester and will be there too, please come and say hello

June 2008?

dagolden on 2009-07-05T01:37:19

Congrats to Dan, but let me when he breaks 125k and I need to pad my lead more. :-)

-- dagolden

I want my money back! All of it!

Aristotle on 2009-07-05T03:44:45

Just a heads up:

Cross posted from the CPAN Testers Blog.

The feed over there isn’t updating: the last posts from the front page do not show up.

Re:I want my money back! All of it!

barbie on 2009-07-05T07:59:16

Ooops. I'd accidentally disabled it :( Fixed now :) Thanks for letting me know.

Ideas for more statistics

Alias on 2009-07-05T18:56:17

I've been thinking a bit about how to take the CPAN Testers Statistics site up to the next level, as part of my general plan to implement "consciousness" for the CPAN (more on that later in the year).

In particular, I think the CPAN Testers is a critical part of one of the main feedback loops.

1. CPAN Testers locates problems and problem modules.
2. The FAIL 100 prioritises those problems by level of impact.
3. (An future list tracks and rewards the people that fix the problems)
4. CPAN Testers validates the correction of the problem.

In order to make this more efficient, one change I wouldn't mind seeing is a change to (or an addition to) the scoring system you use.

What I'd really love to see is an alternative "game" that counted the number of FAIL and UNKNOWN reports filed, multiplied by the distribution "Volatility" score.

That is, you allocate points based on the ability to locate new and interesting failure cases for important modules.

This is the sort of game I suspect someone like SREZIC (with his 5.5 and 5.6.2 smokers) would do pretty well in. And it would GREATLY increase the incentive for someone to get a VMS smoker running (or some other hard-to-obtain platform) :)

One more thing

Alias on 2009-07-05T18:59:47

BTW, your Uploads count on that first graph is wrong. It's set against a scale that it is completely irrelevant.

You might want a separate graph for uploads...

Re:One more thing

barbie on 2009-07-06T08:23:11

That's partly why I started looking at new stats. There are several that are more applicable to the uploads, which is another reason to take them away from the reports graph.
The ones I did for the talk, give quite a healthy look at CPAN generally, so I think it's worth promote that aspect too :)

Oh, finally...

Alias on 2009-07-05T19:08:04

Any chance of getting that per-tarball PASS/FAIL/NA/UNKNOWN total alternative SQLite database export? :)

I'm up to integrating CPAN Testers for my CPANDB stuff, and I'd love to download something smaller than a 200meg every day.

Re:Oh, finally...

barbie on 2009-07-05T19:58:31

It's currently working in the background at the moment. I just haven't had the time to ensure the data it produces is correct yet. I'll try and get some time this week to verify the results and get it publicised as soon as I can.