When a 99% solutions works, and when it doesn't

Alias on 2010-03-05T07:29:44

With Miyagawa's cpanminus (or as I like to think of it, cpantiny, but hey it's his module) now officially the New Shiny of the moment, the same old arguments are resurfacing about less is more, and worse is better.

And the great wheel of opinion circles again.

Rather than bother commenting on cpanminus itself (I suspect many could guess my opinions) I thought I would add some caveats to the praise, that hopefully can help you identify and engineer your own equivalent successes.

1. These "subset" modules are truly successful only when you make the accuracy trade offs worth it.

Thus, the installation must be effortless, the user interface must be super-simple, zero-conf is an absolute must, and the module must succeed in all those niches where the "real" application is too hard, too big, or too clumsy.

If your accuracy is going to suck, NOTHING else is allowed to suck.

2. As I hint about with "real" these subset modules work best when they are an alternative solution, not the only solution.

If you are inaccurate and the only solution, you are an annoying limitation.

If you are inaccurate and an alternate solution, you are handy because you create a kind of user-pays situation. Simple people with simple use cases get a simple solution. Hardcore people with hardcore needs get a hardcore solution.

So if you want to make a small solution, it has to be a subset of something larger that works better. It can't be the only solution.

3. 99% is a just a marketing number, but the real number does matter.

Why? Take a look at the Heavy 100 index (http://ali.as/top100/") and you'll notice that many major and popular CPAN modules (like, say, Catalyst) have more than 100 dependencies.

With a 99% success rate (using stupidly naive "statistics") every single module on that Top 100 list will fail to install. In a large system with lots of recursive dependencies, it doesn't take much for your install count to grow to 20 or 50 or 100 dependencies.

So you need to be sure about WHICH subset you want to support, so you can be sure what percentage you REALLY need.

For example, despite all the work to support it, there is only one single module currently using Bzip2. Would it be worth it to remove bzip2 support entirely from the CPAN and help that author to convert? Probably.

The biggest benefits of these alternate solutions are often not only to do what they do. It's to demonstrate what is possible, pushing the competitors to do it as well.

MojoMojo installs fine with cpanm

draven on 2010-03-05T08:10:47

It might be worth mentioning that when Miyagawa developed cpanminus one of the test cases he used was installing my wiki MojoMojo, the heaviest app on CPAN according to your Heavy 100 index. Thus the actual number is probably higher than 99%, or at least the 1% would be in an obscure corner of the CPAN.

Re:MojoMojo installs fine with cpanm

educated_foo on 2010-03-05T11:12:59
It looks like 273 dependencies (be patient) (which together guarantee its failure to install). This is nothing close to 99% of CPAN, which contains 8020 authors and 17538 modules. As Sage Mencken declared, "For every complex problem there is an answer that is clear, simple, and wrong." Translated into the modern vernacular, "clear" means "99%."

Re:MojoMojo installs fine with cpanm

draven on 2010-03-05T13:29:02
Was just replying to Alias's claim:
With a 99% success rate (using stupidly naive "statistics") every single module on that Top 100 list will fail to install. In a large system with lots of recursive dependencies, it doesn't take much for your install count to grow to 20 or 50 or 100 dependencies.
Given that logic, and the fact that the top module on that list passes, cpanm would have to work with more than 99% of cpan (or only be broken for very little used modules). I'm quite aware that MojoMojo does depend on most of cpan. I think the 273 dependencies we have is quite sufficient. (and a bit over 1% of the total number of CPAN modules :)

Re:MojoMojo installs fine with cpanm

Alias on 2010-03-07T23:48:48

I note '(using stupidly naive "statistics")'.
Clearly, 99% does NOT mean that a 271-dependency distribution must fail according to either "real" statistics, or the real world.
Plus, there's a bias there that makes the discussion moot, because cpanm has been specifically checked and monitored to work with that distribution.

Re:MojoMojo installs fine with cpanm

miyagawa on 2010-03-11T01:42:08

False. No, i didn't add any specific workarounds or anything to MojoMojo and its dependencies. I just used that, along with Jifty, Catalyst and KiokuDB, as a known source of "huge dependencies of well-maintained modules" test bed. I also tested with randomly selected 100 modules from from 02packages, the same thing.

Congratulations, you've statistically "proven"

hobbs on 2010-03-05T11:31:42

that cpanm is well beyond the "99%" level if installing things is a metric. Not that it is -- any connection between "99% solution" and "install succcess rate" was invented entirely by you.

Tiny? 99%?

dagolden on 2010-03-05T16:33:48

At 2200+ SLOC, I don't think it qualifies as a "Tiny" module except in comparison to CPAN(PLUS).

-- dagolden

Re:Tiny? 99%?

Alias on 2010-03-08T01:04:25

::Tiny was never about specific line count (although that was the main practical constraint).
If it uses less than 10% of the memory consumption of it's peers, and doesn't have any dependencies, it largely meets the main criteria (even if it achieves this by bundling some modules and using a few other tricks).

The Rise of Better is Worse

chromatic on 2010-03-05T22:20:03

I remember that before Module::Build could replace MakeMaker, it had to do things MakeMaker didn't do well, if at all (else all it had to do was crib PREFIX support).

Fortunately not everyone believes that the complexity of a system remains constant (and many people believe that removing 90% of the pain of an existing system is sometimes worth increasing pain 10% in a few other cases).