Rethinking the interface to CPAN

petdance on 2008-04-06T07:04:18

I've started a group, rethinking-cpan, for discussing the ideas I've posted here. -- Andy

Every few months, someone comes up with a modest proposal to improve CPAN and its public face. Usually it'll be about "how to make CPAN easier to search". It may be about adding reviews to search.cpan.org, or reorganizing the categories, or any number of relatively easy-to-implement tasks. It'll be a good idea, but it's focused too tightly.

We don't want to "make CPAN easier to search." What we're really trying to do is help with the selection process. We want to help the user find and select the best tool for the job.

It might involve showing the user the bug queue; or a list of reviews; or an average star rating. But ultimately, the goal is to let any person with a given problem find and select a solution.

"I want to parse XML, what should I use?" is a common question. XML::Parser? XML::Simple? XML::Twig? If "parse XML" really means "find a single tag out of a big order file my boss gave me", the answer might well be a regex, no? Perl's mighty CPAN is both blessing and curse. We have 14,966 distributions as I write this, but people say "I can't find what I want." Searching for "XML" is barely a useful exercise.

Success in the real world

Let's take a look at an example outside of the programming world. In my day job, I work for Follett Library Resources and Book Wholesalers, Inc. We are basically the Amazon.com for the school & public library markets, respectively. The key feature to the website is not ordering, but in helping librarians decide what books they should buy for their libraries. Imagine you have an elementary school library, and $10,000 in book budget for the year. What books do you buy? Our website is geared to making that happen.

Part of this is technical solutions. We have effective keyword searching, so you can search for "horses" and get books about horses. Part of it is filtering, like "I want books for this grade level, and that have been positively reviewed in at least two journals," in addition to plain ol' keyword searching. Part of it is showing book covers, and reprinting reviews from journals. (If anyone's interested in specifics, let me know and I can probably get you some screenshots and/or guest access.)

BWI takes it even farther. There's an entire department called Collection Development where librarians select books, CDs & DVDs to recommend to the librarians. The recommendations could be based on choices made by the CollDev staff directly. They could be compiled from awards lists (Caldecott, Newbery) or state lists (the Texas Bluebonnet Awards, for example). Whatever the source, they help solve the customer's problem of "I need to buy some books, what's good?"

This is no small part of the business. The websites for the two companies are key differentiators in the marketplace. Specifically, they raise the company's level of service from simply providing an item to purchase to actually helping the customer do her/his job. There's no point in providing access to hundreds of thousands of books, CDs and DVDs if the librarian can't decide what to buy. FLR is the #1 vendor in the market, in large part because of the effectiveness of the website.

Relentless focus on finding the right thing

Take a look at the front of the FLR website. As I write this, the page first thing a user sees is "Looking for lists of top titles?" That link leads to a page of lists for users to browse. Award lists, popular series grouped by grade level, top video choices, a list called "Too good to miss," and so on. The entire focus that the user sees is "How can I help you find what you want?"

Compare that with the front page of search.cpan.org. Twenty-six links to the categories that link to modules in the archaic Module List. Go on, tell me what's in "Control Flow Utilities," I dare you. Where do I find my XML modules? Seriously, read through all 26 categories without laughing and/or crying. Where would someone find Template Toolkit? Catalyst? ack? Class::Accessor? That one module that I heard about somewhere that lets me access my Lloyd's bank account programtically?

Even if you can navigate the categories, it hardly matters. Clicking through to the category list leads to a one-line description like "Another way of exporting symbols." Plus, the majority of modules on CPAN are not registered in the Module List. The Module List is an artifact a decade old that has far outlived its original usefulness.

What can we do?

There have been attempts, some implemented, some not, to do many of these things that FLR & BWI do very effectively. We have CPAN ratings and keyword searching, for example. BWI selects lists of top books, and Shlomi Fish has recently suggested having reviews of categories of modules, which sounds like a great idea. I made a very tentative start on this on perl101.org. But it's not enough.

We need to stop thinking tactical ("Let's have reviews") and start thinking ("How do we get the proper modules/solutions in the hands of the users that want them.") Nothing short of a complete overhaul of the front end of the CPAN will make a dent in this problem. We need a revolution, not evolution, to solve the problem.


Reviews of categories

Juerd on 2008-04-06T20:34:08

Part of this solution is to seek existing comparisons. One that I know because it refers people to my module DBIx::Simple, is Aristotle's review titled "A brief survey of the DBI usability layer modules on the CPAN", on http://www.perlmonks.org/?node_id=504724

That class of article is my favourite when picking new modules, but it's very hard to find. Is collecting these writeups what you're after?

You nailed the problem

Lecar_red on 2008-04-07T16:31:42

You are completely correct that CPAN (or some other interface to it) needs to solve the problem of getting the right tools to the right users. I think if you aren't completely in the Perl community, filtering through the sheer volume of CPAN modules to find the "best" or "best for my purpose" is really hard and very time consuming. (even if you are active its hard to figure out sometimes).

I think Shlomi Fish's idea of articles is great. It would be great to have many different authors. Maybe a good place for sponsorship from the TPF or from companies? ( This weeks sponsor is Joe's Diner and Auto Repair )

I can see some challenges with articles though. They go stale, I think PBP did a good job of pointing out modules to use but even this ages. I think things like Perl monks and wiki's 'feel' too temporary. I think it would be nice to have a site with a community of authors and editors that can support this type of documentation, maybe like wikipedia w/entries coming from anyone but with editors to ensure quality and maybe a way to flag articles that are aging and need to be updated or at least reviewed. (maybe include version of module that article was based upon? ).

Also, it needs to merge some of the review content and taxonomy bits from CPAN.

need a subject

educated_foo on 2008-04-12T16:13:58

I would rather not create a google account just to chime in, but I agree with a recent poster that download stats would be very useful. As a backwater module author, I don't get enough bug reports and thank-yous to have a sense of how many people are using what I write.

You will respond with "blah blah mirrors blah blah," but (1) there just aren't that many CPAN mirrors, and (2) even a rough estimate would be useful, so it doesn't matter if some don't participate.

Re:need a subject

petdance on 2008-04-12T20:01:59

You're incorrect about my response.

Download stats are off my radar for two reasons. One, I'm not interested in this project improving CPAN for the authors. The authors already get all the attention. Two, the changes I'm looking at making are entirely separate from the existing CPAN infrastructure. Download stats would require changing how CPAN materials are distributed, and that's not a job for this project.

Everything on this project is based on the premise of providing a new face on existing infrastructure, at a distance.

Re:need a subject

educated_foo on 2008-04-14T01:06:46

Sorry, I shouldn't have tried to put words in your mouth.

First, download stats would be useful for both authors and users (e.g. "which of these Test::* modules do most people use?"). Second, a script to anonymize download logs and send them to someone could be completely separate from the mirror/download infrastructure (it's kind of an implementation detail ;). It doesn't look like you're interested in going this way with your project, though, and limited scope can be a good thing, so I won't press the issue further.