CPAN Testers Stats - July Update - The Times They Are A-Chan

barbie on 2008-08-04T22:40:48

CPAN Testers Statistics

First off an apology. It seems that I used a legitimate email address in a rant back last year, about the use of tester email addresses. At the time, the address didn't appear to be real and had given the impression that the address was being used as a spamtrap. The owner of the address has since contacted me to inform me that it is real, and that it doesn't get used as a spamtrap. So my sincere apologies to Rob for casting doubt on his email address. However, the address in question had caused confusion, so Rob has kindly altered the address he now sends from, so that authors can feel more comfortable about sending any queries to him. Many thanks to Rob for this.

At the beginning of the month, I launched a new domain for CPAN Testers, cpantesters.org. The new domain will hopefully prove more helpful in remembering all the sites used in the CPAN Testers family of sites, and maybe even start including a whole bunch of new ones. In order to make it even more easier to find other sites, linking between the sites with a standard dropdown is now planned, thanks to the suggestions of a few people. The reports site is currently receiving an update and will be rsync'ing to the public site soon.

August sees many of us here in Europe heading to Copenhagen, for the annual YAPC::Europe Perl Conference. On the first day of the event, on Wednesday 13th August, Chris Williams will be presenting 'Rough Guide to CPAN Testing', with a CPAN Testers BOF planned at the end of the day, so be sure to come along to find out more. I'm also hoping to add to my collection of photos of CPAN Testers, so please make sure you come and say hello to me during the conference if you are a CPAN Tester.

As mentioned last month, Leon has given me the go ahead to update the code behind the scenes for the data collection and creation of the reports site. The first part of the switch over was to merge the code that does the data collection, with the previously forked code that is currently used by the Statistics site and CPAN Dependencies site. When I forked, there were a few fields that weren't previously being recorded that were needed (e.g. the tester email address) and a few that weren't used by the stats site that were dropped. Rather than having two separate systems it makes sense to merge these changes. Plus in the last month or so, I've picked up on reports that were sent in Base64 or Quoted-Printable that were previously not being record properly. There are some further changes, such as the fact that the stats database also parses the UPLOAD articles, and a whole bunch of new tests, which have resulted in quite a nicely packaged little distribution. Before launching it on the world, I'm running it in by generating the database from scratch for all articles. Originally I was going to drop the creation of the articles database, which is held locally, but decided to keep it, as it makes look ups a lot quicker. However, it's currently over 7GB and is still not half of the way through the rebuild! The lastest complete stats database is just over 400MB, and is much more manageable. Thankfully disk space isn't an issue yet, but as a consequence I've made the creation of the articles database optional. Assuming anyone else is mad enough to try and generate this stuff ;)

This time next month, I hope to having all the CPAN Testers sites running off the same database, and will hopefully have the updated reports site online for all. Thanks again to everyone for the patches, ideas and support. It's much appreciated.

Again we had over 100 testers submitting reports last month, with 43 new addresses mapped, of which we had 7 new testers identified. After David Golden's magnificent effort last month, his bots have settled down now to a more sedate pace. As a consequence Chris has once again taken the top spot this month, and has even posted his highest number of reports in a single month. No doubt he's been getting his daughter to help out while he's been at work ;)


I found this bit at the bottom of the page...

Alias on 2008-08-05T04:54:53

...interesting.

The Database: cpanstats.db.gz (2008/08/05) (0MB compressed, 411MB uncompressed)

Regarding the graph

Alias on 2008-08-05T04:57:17

Can I also note that I'm finding that the "current month" numbers cause more harm than good, since they invariable show a massive "crash" in testing numbers for the current month.

Unless you plan to estimate the current month forwards by a $current * $monthdays / $monthday type multiplication, we're far better off without it on the graph.

Re:Regarding the graph

barbie on 2008-08-05T09:39:41

Removed for the next update.

Re:I found this bit at the bottom of the page...

barbie on 2008-08-05T09:31:39

Fixed for the next update. The path to the compressed file was incorrect :(

Newsfeed?

Aristotle on 2008-08-10T21:44:56

You don’t have an on-site newsfeed. Your use.perl journal is sufficient for those of us who subscribe to all use.perl journals, but as a way of providing a dedicated feed for the site (and as a place for permalinks), it is not that great.

Is stats.cpantesters.org driven by any sort of CMS? Is the code available somewhere? I would volunteer to write a patch. (Alternatively, if you take a look at XML::Atom::SimpleFeed, that should make it a snap to add a feed.)

Re:Newsfeed?

barbie on 2008-08-10T22:07:31

Good idea I'll have a look at that. At the moment the site is built as a static site on a daily basis, So it might even make it easier creating the templates and the XML feed as I add a new entry.

The current code is all available in the tarball (see the link at the bottom of the index page), although a lot is shortly going to change once the new generator code is in place.

Re:Newsfeed?

Aristotle on 2008-08-10T22:29:55

Ah, I didn’t notice that.

Hmm, the 0.46 tarball is a 404 and the -latest tarball only contains 0.44.

Hmm, you have all the updates in a single big file without detailed dates. That’s not ideal to retrospectively generate timestamps from… do you have each change of the update.html recorded in a repository? It would be nice not to have to make up datetimes with an empty time part.

Also, once you start to generate feeds it would be nice for each update to have a separate permalink. That’s much linking-friendlier.

As for writing a template to generate the feed, there are subtleties to getting feeds right WRT making sure you get well-formed XML out the other end. So even if you want to drive generation from a template due to separation of concern issues, I would still suggest doing so by using Template::Plugin::Class to use XML::Atom::SimpleFeed from within the template.

Re:Newsfeed?

barbie on 2008-08-10T22:49:31

Hmm, the 0.46 tarball is a 404 and the -latest tarball only contains 0.44.

Oops sorry about that. All fixed. I forgot to update the the mainsite with the last two updates.

Regards the feed, I'll set that up so the content is driven from a standalone file, that will also be used to create the templates for the main static site, for the update pages and the index.