CPAN Testers Reports - Future Plans

barbie on 2008-08-23T08:31:51

For the past month I've been working on the code upgrades to CPAN Testers Reports, and since taking it over one of the aspects that has been requested several times has been to restrict the content that is seen in the browser. In some cases this is based on the author only wanting to see the FAIL reports, or a user only wishing to see reports that relate to their OS and/or version of Perl, etc. There are essentially 405 different ways of displaying the page for the combinations I'm currently working with, and I haven't yet worked out the OS and Perl version yet!

Because the reports site is currently static and rebuilt periodically, this method of refining results relies on the browser being Javascript enabled, changing the CSS settings as appropriate. However, with the sheer volume of data (a complete site rebuild now takes several hours), this isn't perhaps the most efficient way to actually produce the site anymore. With that in mind, I've been looking at ways to improve this, and the most obvious one is to change the reports site to a dynamic site. Only processing pages that are requested means there are less pages being rendered, which can be cached to reduce processing, and thus reduces disc space. Part of the problem at the moment is that the site takes up 3.7GB with 85,120 pages. Some may claim disk space is cheap, but with the current VMs I have access to that's a notable portion of my allocation, especially when you factor in the 17GB data store of all the NNTP posts since CPAN Testers started.

So, the next release of the code will likely be the last one for a static site. It will be released with all the requested fixes that I've been able to implement, and will hopefully clear out most of the current RT queue. There are still some fixes that need to be investigated, but in the main I think most people will be pleased with the changes so far.

After the next code release, the site generation code will be reworked to run as a dynamic site. The user will have the ability to set preferences to what they see, which will then be stored in a cookie, with the immediate benefit being that users returning to the site will repeatedly see only the content they wish to see. However, there is also another benefit. I had a discussion with JJ the other day and he was of the very strong opinion that the numbers that appear on each distribution page of search.cpan.org, regarding the number of reports it has, are at worst erronous and at best misleading. I agree. Currently I would rather just see a link to the appropriate page on the reports site. Being able to use the cookie preferences and an API to retrieve the content for that line of text, search.cpan.org, Kobes search, or any other CPAN search site of the future, need only send a request for that distribution, and a line can be returned that is specifically tailored to that user.

With the volume of reports having increased from a few thousand a month to hundreds of thousands a month in the last year or so, the daily rebuild gets out of date very quickly. Having a dynamic site would make it much easier to keep the content current. It does mean that the current implementation for the data store, using a SQLite database, isn't suitable for this kind on interaction. As such the knock on effect is that the generation code will now need to write to a database that is more agreeable to web delivery. This then leads on to the next major change that is planned for CPAN Testers.

Changing to a dynamic site will also mean that once we finally move to using the HTTP transport method for reports, and using the CPAN::Metabase API, I should be able to change the backend relatively simply to interface to whatever database we end up using to store the reports. Or rather the reports metadata. It also means the site can actually be used virtually realtime, rather than the current lag due to the SMTP and NNTP delays we currently experience. SMTP and NNTP have their own failings, which have been a constant source of frustration to Ask and Robert, who have been very kindly looking after all this all this time. Moving to a more reliable data store (and therefore recovery mechanism) will also mean that we shouldn't lose reports and really REALLY long reports don't get truncated, meaning all that really useful 'perl -V' information isn't lost either.

It all basically means that a lot of work on the backend is expected to happen over the next year. My plan was to get a lot of this working for the reports site by the end of the month, but that is now unlikely to happen. The next release of the static site will be done by then, but the dynamic site will take longer to develop.

If you have ideas for further improvements to the site, please post them to the RT Queue. If you have general ideas for improving CPAN Testers, including the CPAN::Metabase API, please consider joining the cpan-testers-discuss mailing list and sharing them with most of the CPAN Testers developers.


Cookies

Aristotle on 2008-08-25T06:45:01

The user will have the ability to set preferences to what they see, which will then be stored in a cookie, with the immediate benefit being that users returning to the site will repeatedly see only the content they wish to see.

Ay yeye yeye… careful. Please be extremely cautious with cookies as the site’s usefulness can easily suffer. It seems to me that it would be best to simply properly expose everything via URI. Users who are only interested in a particular report can just bookmark that. There is no need to create machinery for preferences and their recovery in the server, IMO. And the more URLs you have, the more linkable the site is, so the more useful it then is to the rest of the web.

Re:Cookies

barbie on 2008-08-25T08:55:44

Yes the preferences will be available via the URL too. I want the user to be able to customise their regular visits to the site via cookies, but they should also be able to view anything they want without altering those settings too.

As you say it may help to link to specific preferences, that an author or user wishes to highlight, and generally make the site more interactive.

The idea for customising your regular visits is to enable those users who only ever want to see a particular set of preferences (e.g. official Perl version releases, no patched Perls and no distribution development releases) don't have to repeatedly set those settings for each page they visit. They will be able to temporarily change them should they wish to other or all values.

This has been a request made by several authors and it has benefits beyond just viewing the pages, such as other sites being able to use a users preferences to only access the information they are likely to want to know about.