How many CPAN modules on the same topic is too many?

Alias on 2006-08-22T21:49:17

Once I finished writing PPI (a process that spanned 3 years) it occured to me that I had reached my scaling limit. There was just no way I was ever going to be able to write anything larger than that on my own and remain effective.

Ever since, my favourite area of computing has become what we call "Programming in the Large". Dealing with the effective design of code and implementation of concepts far larger that you could ever hope to actually write yourself.

It means learning as you can about things like iteration and evolution can be intentionally applied to your design, learning how to treat concepts like "trust" and "property" as fundamental elements of design. And learning how to anticipate areas where, if you just give it a little bit of a push (like the Win32 Perl effort), you can get things past a sticking point and the natural inventiveness of others will take over and do the rest of the work for you (and a big thankyou to everyone that has been putting so much work in lately finding and fixing CPAN modules on Vanilla/Strawberry Perl)

Because I really like this area, I've also gotten much more heavily involved in the CPAN. After a couple of years immersed in it, I think I mostly "get" the CPAN. Exactly why it works so well, when others have failed, and the areas it should be doing better.

But one issue has remained a bit of a challenge for me, and that is the concept of "choice" as an element of design.

When it comes to common areas, CPAN often has more than one module that does the same thing. As a result, people searching for modules need to choose. That we have choice, and that people can upload any old crap they like, is one of the ultimate reasons for the CPAN's success, because in an iterative and evolutionary environment, betting on any single solution exclusively can be very risky.

Just look at Maypole, who could never have predicted that the ORM module they chose (Class::DBI) would so suddenly fall out of favourite with developers and suddenly turn Maypole from the cutting edge into a second-tier choice amoung the Perl technorati.

It is notable that its logical successor Catalyst embraced choice and diversity, and so when the Class::DBI debacle occured, it was quite a different story. While maintaining Class::DBI for compatibility, they have switched their default choice to DBIx::Class and have moved on with relatively little negative effect.

But one of the perplexing problems with choice is the problems of too much choice.

I've had it put to me by several Perl trainers that teach university students that having 4 modules for roman numerals is completely silly. It causes a problem for them, because students are trained to look for the one correct answer, so any university assignment featuring roman numerals causes much consternation.

The message from trainers on this issue of choice has been very consistent.

And one need only look at the number of modules for parsing Windows .ini files (my Config::Tiny included) to see how one could be completely flustered by the choices.

Sometimes having 5 modules to choose from, each for a different situation or that makes different trade-offs, is good. Sometimes 5 is too many...

So I was curious to see today that there is work being done on this very problem. On exactly how MUCH choice is a good level of choice. And how too much choice can make people sick.

For all those Programming in the Large devotees, the following video is recommended viewing.

Sorry about the RealPlayer format, but thanks go to the ABC for making it available to the world for free, from their (co-incidentally named) Catalyst science TV program.


The only correct answer is

jk2addict on 2006-08-23T01:07:14

42.

I should make a 'greatest hits'

hfb on 2006-08-23T04:57:40

book of all the most frequently annoying questions and criticism of the CPAN and post it somewhere...not that it would ever get read, but at least I'd have cut and paste.

Nobody ever asks why there are 15 different mustards to choose from in the grocery, so I'm never quite sure why people get fussy about the selection on CPAN unless a) they have a module that fits in a namespace they want but is already taken or b) think everyone elses' module is shite save for theirs. It's a free archive...pick and choose at will. Caveat Emptor. Variety is the spice of life.

With regards to the trainers who complain that the newbies can't figure out the variety....I've said this so many times you'd think maybe it'd catch on by now. Maybe they're too busy listening to the sound of their own voices. The trainers or, really, anyone who gives a fuck can make their own 'recommended' bundle which would give newbies and fanboys a grouping of modules that have the seal of approval of someone they trust. It's simple. It's easy. And most importantlyy, it doesn't force the assholey stance after all these years of 'there can be only one'.

Re:I should make a 'greatest hits'

djberg96 on 2006-08-23T14:51:20

"Nobody ever asks why there are 15 different mustards to choose from in the grocery..."

Agreed. That's why I'm going to take a cue from Shin Honda and release File::STat, File::STAT, File::staT, and File::StaT tomorrow!

Because, hey, variety is the spice of life.

Re:I should make a 'greatest hits'

Alias on 2006-08-23T22:06:07

If you had watched the video, you'd have perhaps gotten my point.

If you offer people 24 mustards to try from, lots of people will come and look, but not many will buy.

If you offer people 6 mustards to try from, fewer people will come and look, but MORE people will buy.

That is the critical point here.

Where I'm curious about is IF this applies to CPAN modules.

I'm not complaining that it does.

I don't know anyone left related to CPAN, me included, that honestly thinks that only having one choice for a single topic is a good thing.

The trainers understand this as well. They aren't idiots.

But that doesn't avoid the problems they have with students, especially ones for who programming Perl isn't what they want to do with the rest of their lives, and they are just trying to get through the assignment.

CPAN and choice

pjf on 2006-08-31T05:23:48

I should expose a few things up front. I run Perl Training Australia, and my full-time job for the last five years has been teaching people how to use Perl. I've got a good idea what newcomers to the language like, and what they don't like.

They don't like the CPAN. They absolutely adore it. One of the best parts of running an introductory course is letting our students play with the CPAN. It's very rare for a student not to find a module that directly helps them with their day-to-day work, with their leisure activities, or most often both. Everyone loves the CPAN.

What they sometimes find frustrating is when "just want a module that does X", and their CPAN search reveals hundreds of results. Reading through (and testing!) all the options takes a lot of time, and isn't necessarily going to be a productive way of solving their task, so how do they make a decision?

The other thing about the CPAN is that there are some really life-changing modules out there. Finding Class::DBI / DBIx::Class changed my life. So did WWW::Mechanize. As someone new to Perl, you may be unaware of these modules. You may not have even thought about searching for them. How do you go looking for something you don't even know exists?

These two questions arise all the time in our training courses, and are also common questions at the conferences and user-groups I attend. I have two standard responses:

The first is to become active in the Perl community. That can be perlmonks, your local Perl Mongers group, or some other way of keeping in contact with others who use Perl. Very often these communities will be able to provide excellent answers to questions like "which XML module should I use", or "what's your favourite Perl module?".

My second response always seems to point to the Phalanx 100. These are the CPAN's most commonly downloaded and tested modules. Lots of people are using them. In the absence of other factors, picking a module from the Phalanx 100 is usually a good choice.

The majority of people to whom I teach Perl are already software developers, and they already have a degree or industry experience. However as Adam has mentioned, teaching Perl at a University level is a different matter.

University students, in the vast majority of cases, are more interested in getting good marks than exploring the possibilities of choice or doing things "the best way". Your grades are what get shown on your academic transcript, and what your techers, parents, peers, and potential employers see. Having four modules for roman numerials doesn't fill you with joy, it just makes the assignment harder.

There will always be exceptions to this. Some students will be truly interested in the subject, and really there for the joy of learning and discovering. However most will just want the grades, and some will only be doing the subject as it's a pre-requisite for something they'd rather be doing.

I don't have a good solution to this one. I can say that teaching Perl in the later years of a degree can help, since by this time students already have a wider exposure to languages and ideas.

Cheerio,

Paul

Education, Tyranny, and Solutions

duff on 2006-08-23T06:20:40

It causes a problem for them, because students are trained to look for the one correct answer, so any university assignment featuring roman numerals causes much consternation.

This is such a crime! What ever happened to critical thinking? If anything could sum up my dissatisfaction with the education systems I've been exposed to it's that they train people to put blinders on and find "the" answer. :-(

You may be interested to know that there was a Scientific American article not too long ago entitled "The Tyranny of Choice" that explored the subject just as the video you linked to. It may have even been generated by the same people for all I know.

hfb has a point. One thing the perl community hasn't grown yet is a way to say "module X is the best general purpose module for task Y, never mind the 50 other modules that do roughly the same thing" CPAN ratings is a step in that direction, but I'm not sure it's quite going to get us there. I'd put more stock in reviews by lwall, merlyn, Damian, etc. than I would in reviews by bunches of random people.

Maypole vs Catalyst

slanning on 2006-08-23T09:49:25

I'll leave the "too many choices" part alone, as I basically agree with hfb. :) Also there have been a couple threads on "separating the wheat from the chaff" on perlmonks recently, so it no longer interests me.

But I do have to offer another suggestion for the relative success of Maypole and Catalyst. You claimed:

Just look at Maypole, who could never have predicted that the ORM module they chose (Class::DBI) would so suddenly fall out of favourite with developers and suddenly turn Maypole from the cutting edge into a second-tier choice amoung the Perl technorati.

It is notable that its logical successor Catalyst embraced choice and diversity, and so when the Class::DBI debacle occured, it was quite a different story. While maintaining Class::DBI for compatibility, they have switched their default choice to DBIx::Class and have moved on with relatively little negative effect.

I wouldn't claim to know for sure, but I don't think Maypole's choice of Class::DBI has much to do with Maypole's being relatively less popular than Catalyst. My impression, having followed Maypole's initial development, is rather that Simon (Cozens) intended it to be more of a proof-of-principle framework that he worked on while he had funding from TPF. Catalyst got started as a fork of Maypole, after Simon lost interest in it. See for example the list of maintainers of Maypole to find that Sebastian Riedel maintained Maypole for a while. Until he forked it into Catalyst. If you look in the Changes file of Maypole, you'll also see that several of its contributors are Catalyst developers.

So my conclusion is that the choice of database abstraction layer has little to do with the popularity of Catalyst.

Re:Maypole vs Catalyst

naughton on 2006-08-23T20:28:24

So my conclusion is that the choice of database abstraction layer has little to do with the popularity of Catalyst.

I agree, but here you're refuting a point that Alias didn't make. He didn't say it made Catalyst more popular than Maypole. His point was that, because "Catalyst embraced choice and diversity", they were able to switch "their default choice [of ORM] ... with relatively little negative effect." That's just one example of how a certain amount of choice in building blocks can allow a framework to survive when one of those building blocks goes boom.

Where you referred to "choice of database abstraction layer", notice that Alias referred to it as a "default choice". That hints at Catalyst's answer to his main question, what's the right amount of choice? While Catalyst does offer a lot of options, it takes a middle-way, Pythonesque approach of providing "one obvious way to do it", via both documentation and helper scripts that use good, default options.

I don't think Maypole's choice of Class::DBI has much to do with Maypole's being relatively less popular than Catalyst.

Here your criticism is well taken. The "Class::DBI debacle" Alias refers to occurred well after Catalyst forked from Maypole, and Maypole's popularity was already waning. The fork occurred for many different reasons, several of which probably have something to do with the popularity difference. Not that the reasons for the fork are the whole story, either.

However, a look back at the old Maypole list (start here) shows that "amount of choice" was a big issue that led to the fork. While Sebastian Riedel found Maypole "too CDBI/TT2 focused", many respected developers thought it was a very bad idea to provide more choice in building blocks. Tony Bowden actually wanted Maypole to be "even more entwined with them [CDBI/TT2]." And while it's true that Simon Cozens had given up maintainership, he had hardly "lost interest" in Maypole. Simon more than anyone else pushed Sebastian to fork, and one could argue that the "amount of choice" issue was one of the reasons.

All that said, I find the comparison of Catalyst to Maypole too narrow. I'll be making a long term comparison of Catalyst to Jifty, which takes a more Maypole-like approach of limiting choice in favor of simplicity, but is starting to achieve a Catalyst-like level of hype.

Re:Maypole vs Catalyst

Alias on 2006-08-23T21:57:31

It's true that there was already some muttering on moving away from Maypole at the time, but it WAS still considered to be the dominant choice for web MVC.

While it may have happened in any case that people moved away, having Class::DBI going boom most certainly accelerated the migration.

From the point of view of someone not actively involved in the web MVC developments (since I have my own MVC framework) it was almost an overnight migration, much much faster than you would normally see.

As for Catalyst vs Jifty, my current suspicion is that they don't truly compete with each other at all, and that the compliment each other.

Catalyst is by far the best choice when the things you integrate with are more important than your application. In database-centric situations, or where you B play friendly with other people's Important Things, I think Catalyst will continue to be the best choice, as long as they don't screw anything up (and I'm fairly confident that they won't).

Jifty, should it become successful (and I'm not willing to call it that quite yet) becomes the best choice for the opposite situation, where your application is the most important element, and can dictate how the database works.

This gives us a good top-end solution highly customisable solution (Catalyst) and a good bottom-end solution (Jifty). There will be most certainly some conflict in the middle grounds, but such is the way of things.

That said, long term I don't see any other major ecological niches.

Comparing Maypole to Catalyst is only because of the timeline, not because of their ecological niche.

Catalyst is not the Maypole killer. Jifty is the Maypole killer.