Idea! CPAN Version Advisories

brian_d_foy on 2007-09-17T02:26:00

My friend Jeff woke me up today because I IM'd him late last night saying "I have this great idea and I need to talk with someone about it!" This isn't that idea, but another one Jeff suggested.

We got to talking about the problem of CPAN dependencies. He mentioned that Python modules don't really have many dependencies, but they're not very complex either. CPAN modules can do some really complex things because they can let some other module do the work, but that leads to a long, and fragile, dependency chain.

We talked about some of the attempts to solve that and why they suck. They mostly revolve around depending on a fixed version. Debian packages can depend on a major version of another package (say, 3.x) and won't use a newer major version. Module::Install tries to solve it by shipping a fixed version of itself and any dependencies. only.pm tries to solve it by coding in specific versions. The problem with these approaches is they try to freeze the upgrade process.

The idea that modules get better over succeeding revisions is basically correct in the long run. Software is not wine, old versions don't get better with age. We just adapt to their quirks. The more adapted we are to a particular version's quirks the harder it becomes to upgrade. The dynamic upgrade environment avoids this sort of particularly nasty bit rot.

Jeff explained Debian's system and said it's to avoid API changes. Debian policy doesn't allow incompatible API changes within a major version. But that's not really CPAN's problem, they're usually bugs. With that realization, Jeff pointed out the key thing: the problems are *temporary*.

A single version of a CPAN module has a bug which breaks your dependency chain, but the next version will probably fix it. Or the next one, or the next one. Eventually, the march of versions forward will fix the problem. So hard coding "only use version 1.34" into your module takes a temporary problem and applies a permanent halt to upgrades.

So the solution should also be temporary and only last as long as the problem exists. Wouldn't it be neat if there was some way to flag the upgrade of a dependency to be problematic? Wouldn't it be neat if there was a way for the CPAN shell to query a service and request the advisories about a particular module? It's like traffic advisories.

"...and here's Michael Schwern with the CPAN traffic report. Michael?"

"Bill, I'm flying over the latest RSS feed and DBI is jack knifed across two lanes of dependency traffic. It looks like a bug fix in Test::More 0.71 revealed a flaw and caused DBI's test suite to skid. A couple of XML modules got tangled up in that crash, the patching crews expect to have them fixed by tomorrow evening. Those of you thinking about installing DBI should avoid the Test::More 0.71 and take the 0.70 instead today. We'll have more updates at the top of the hour. Over to Pudge with the sports. Pudge?"

"Today the Red Sox..."

What happens is when there's a dependency problem, say a bug fix in Test::More reveals test failures in DBI (which it does), the author of the dependent module (Andreas) can report a CPAN Version Advisory that DBI 1.59 and Test::More >= 0.71 are incompatible. When a user goes to install DBI the CPAN shell queries the service for any advisories. It uses that information to decide what to do. In this case it might say "DBI 1.59 is incompatible with your installed version of Test::More (0.71). Would you like to downgrade to 0.70?" It could even get clever and note that a given module is only a build requirement and temporarily downgrade just for the build and tests.

You could even go so far as to have the advisories contain a patch.

Later on, when DBI puts out a fixed version and the dependency conflict is resolved, Andreas can remove the advisory and dependency resolution marches forward as normal.

The advantages this has over hard-coding this information into, say, DBI's own Makefile.PL is that it's orthogonal to module releases. The problem occurs between DBI releases. DBI would have to release a new version just to say "I don't work with Test::More 0.71" and that's a lot of bother. It also does not require the author to issue the advisory. The author of the dependency could issue an advisory, or a trusted 3rd party. This removes several bottlenecks from the process.

It would seem to relieve much of the day-to-day problems with failing CPAN dependencies these days while hanging onto the basic, forward looking assumption that latest version good.

CPAN Tester Reports

barbie on 2007-09-17T07:00:42

An interesting idea. It actually crossed my mind that in many cases CPAN Testers spot these sort of incompatibilities, but with the scale of the reports and CPAN it could get tricky and very time consuming to review them all. Or were you only thinking of doing this with major distributions (e.g. Phalanx 100?), or when an actual user hits a problem?

Once PITA is up and running, this sort of confirmation of fixes could be done fairly quickly, perhaps with an automated test harness that tests notable combinations of dependencies of a distribution. It might make authors more aware of older dependencies that should be avoided by upping the version in their prerequisites.

Re:CPAN Tester Reports

schwern on 2007-09-17T22:24:34
Just humans. We usually know what's wrong, we just don't have a way of letting the installers know.

You can't ask the users anything...

Alias on 2007-09-17T08:00:04

> What happens is when there's a dependency problem, say a bug fix in
> Test::More reveals test failures in DBI (which it does), the author of
> the dependent module (Andreas) can report a CPAN Version Advisory that
> DBI 1.59 and Test::More >= 0.71 are incompatible. When a user goes to
> install DBI the CPAN shell queries the service for any advisories. It
> uses that information to decide what to do. In this case it might say
> "DBI 1.59 is incompatible with your installed version of Test::More (0.71).
> Would you like to downgrade to 0.70?" It could even get clever and note that
> a given module is only a build requirement and temporarily downgrade just
> for the build and tests.

"Would you like to downgrade?"

It's going to be extremely difficult for users in this situation. If we apply the "7 layers of recursion" principle, how are they going to have any idea what to do in this situation?

Most likely they'll not know, but your question suggests they probably should.

So they agree to a downgrade (which CPAN.pm doesn't REALLY know how to do properly anyway, except by installing over the top of the newer version) and now they have an older version of a module which breaks a dependency of something ELSE which needed the new version to work.

I see lots of places where this could go wrong...

Perhaps an alternative would be to flag the new Test::More version as "pathological", which would (temporarily) remove it from the index.

This would let you "back out" a buggy (either itself, or downstream) released version, with the index reverting to an older version.

You'd get a window to fix the downstream modules, and then you could enable that release again...

Re:You can't ask the users anything...

schwern on 2007-09-17T22:45:57

It's going to be extremely difficult for users in this situation. If we apply the "7 layers of recursion" principle, how are they going to have any idea what to do in this situation?

It provides the user with *some* information and a way out, as opposed to nothing. Part of the point is just to make this sort of ad-hoc information that we normally communicate via word-of-mouth available to the world in a human and machine readable format.

But I see your point. Yes, the downgrading may be problematic, but if that's the recommended solution then there it is. Including a link to the human-friendly, textual description of the problem would help with the decision.

Additionally, for build and test dependencies the downgrade would be in a temporary installation just for the duration of the build. This part could safely be automatic.

Perhaps an alternative would be to flag the new Test::More version as "pathological", which would (temporarily) remove it from the index.

There's several problems with that approach. For one, it throws the baby out with the bathwater. Test::More 0.71 works fine, but other modules depended on a bug. It's not Test::More that's the problem, it's particular dependencies upon it. Removing it from CPAN because of this also removes any other bug fixes that went along with it. There was a similar problem way back when Test::Builder changed it's output format and broke Test::Builder::Tester. I didn't want to pull that version off of CPAN either because it fixed so many other problems. It's an all or nothing approach.

The nice part about the advisories is it only targets the broken relationship. What if 0.71 fixed five dependencies and broke one?

It requires the author's intervention. Often the very problem is that an author is not being responsive. It does not address the bottleneck issue. You could just have the CPAN cabal unilaterally decide to yank a module from the index, but this is not a happy making thing, CPAN has always been very hands off, and it just moves the bottleneck to the CPAN cabal. The advisories allow either end of the relationship to report the problem.

Re:You can't ask the users anything...

Alias on 2007-09-17T23:34:43
> It requires the author's intervention. Often the very problem is that an
> author is not being responsive. It does not address the bottleneck issue.
> You could just have the CPAN cabal unilaterally decide to yank a module from
> the index, but this is not a happy making thing, CPAN has always been very
> hands off, and it just moves the bottleneck to the CPAN cabal. The advisories
> allow either end of the relationship to report the problem.

As I understand it, in your description it's the CPAN "cabal" that is doing the advisories anyway.

Or maybe I'm just getting confused because you mentioned Andreas, who isn't responsible for either Test::More OR DBI...

Re:You can't ask the users anything...

schwern on 2007-09-18T00:38:07
My mistake, that should have said "Tim". Fixed that.

To clarify, the people issuing the advisories are the authors of the modules effected. The author on either side of the relationship can issue the advisory, so in my Test::More / DBI example it's either me or Tim Bunce.

It's also possible that a 2nd or 3rd order dependent could issue an advisory. So, say, Matt Trout could issue the Test::More / DBI advisory because DBIx::Class needs DBI and his users are effected by the breakage.

There would also be trusted 3rd parties who can issue advisories, some hand waving there about exactly how that's determined. Advisories could also be submitted for consideration which is how Joe User would report a problem, have it checked by a trusted human and promoted to an advisory.

Re:You can't ask the users anything...

Alias on 2007-09-18T02:52:45
Check by, say, the CPAN cabal? :)

Re:You can't ask the users anything...

schwern on 2007-09-18T08:58:03
No, the CPAN Advisory cabal. :P

Re:You can't ask the users anything...

Aristotle on 2007-09-18T01:46:57

So they agree to a downgrade (which CPAN.pm doesn’t REALLY know how to do properly anyway, except by installing over the top of the newer version) and now they have an older version of a module which breaks a dependency of something ELSE which needed the new version to work.

Of course, as with most things dependency-related, Debian got that right a long time ago, by tracking “reverse dependencies”, ie. not just what packages a package depends on, but also which packages it is a depedency of.

Given such information, the CPAN client could automatically decide whether a particular upgrade or downgrade is unsafe.

Re:You can't ask the users anything...

Alias on 2007-09-18T02:55:45
Debian is an excellent binary repository, but it is most certainly NOT a great model if you have a source repository.

For example, our dependencies are complete (programatic in nature), vary across platforms and as such it's extremely difficult to know downstream deps.

That said, for any given host, you can "localize" the dependencies to static form, which could perhaps them be saved somehow...

The META.yml -> META-LOCAL.yml idea is looking better and better.

Re:You can't ask the users anything...

Aristotle on 2007-09-18T04:44:37

I don’t know what difference source vs binary would make in this case. The local inventory of installed software almost always differs from the repository, which generally only tracks information about the newest version of every package. So to use reverse dependency checks you always have to record them locally after each successful installation, anyway.

The CPAN client would have to determine the local final list of dependencies and then figure in the recorded reverse dependencies, and could then tell you whether any unresolvable conflicts exist and hash out a resolution for the other conflicts automatically.

Of course, the CPAN client does not actually keep any inventory of the locally installed stuff at all, let alone recording reverse dependencies, so deploying all this will take a fair amount of new infrastructure in the toolchain and quite a long period of incubation until we can rely on it. (But it would not require ubiquity before attaining usefulness, so it’s a realistic goal.)

Nice timing!

ChrisDolan on 2007-09-18T05:10:43

This is well timed. A cpan-testers report from David Cantrell today revealed that Perl::Critic 1.072 made an API change that broke Perl::Critic::More. It's been broken for two weeks and 6 minor releases of Perl::Critic, but nobody noticed.

So, we'll probably push out a Perl::Critic 1.078 soon, but in the meantime, you have to choose between Perl::Critic bug fixes or Perl::Critic::More passing its tests.

The sad thing is that the failure happened while a test script was trying to decided if it was in "author mode" and consequently whether it should even run (the correct answer was no...)

versioning is broken

dito on 2007-09-18T10:39:26

The basic problem is that putting the burden of "what version can I use?" on the user is broken. By user I mean author of a module which uses another module. The library maintainer knows far more about compatibility than the average user and so the library maintainer should be able to publish that information (out of band).

This post goes quite a bit further (probably too far for Perl). I actually implemented the engine for the abstract stuff in that post but the most important aspect is actually the interface. How to allow people to express compatibility without breaking their wrists or their brains.

Error in installing DBIx::Class

realzhang on 2007-09-18T10:46:39

When I try to install DBIx::Class on WinXP with PPM, the error ocurs as follows:

ERROR: File conflict for 'C:/Perl/html/site/lib/Test/Builder/Tester/Color.html'. The package Test-Simple has already installed a file that package Test-Builder-Tester wants to install.

Who could help me? Thanks!!!

Re:Error in installing DBIx::Class

schwern on 2007-09-19T02:03:00
Contact the person who made those PPMs, probably somebody at ActiveState.

PPM is a whole different beastie.

Anybody else notice...

Hercynium on 2007-09-18T13:33:46

As I was reading the "Traffic Report" I thought to myself that it sounded just like the helicopter broadcasts on WBZ-1030 (AM Radio) here around Boston...

So, ending with the Red Sox reference was a teriffic bit of irony :)

'Nudder idea...

EvilSuggestions on 2007-09-18T23:08:53

A long time ago, after a talk that Ingy gave about only.pm, I mentioned to him an idea I had that would avoid hard coding module versions into scripts, but would address the common issue that makes people want to use only.pm in the first place - that your script used to work, but installation of a later version of a module breaks it. And moreover, it leverages one of our favorite topics - testing!

The basic idea is that you write a test suite for your script, run the suite (using some tool that's probably going to resemble prove with some added bells and whistles), and if it passes all tests, capture the list of modules loaded during testing and their associated versions (crawling %INC, etc), and put that somewhere as a "test certificate".

Then, when running your script outside the test suite, it has some pragma like "use only::certified;" that reads the certificate, and loads only the module versions that passed certification testing earlier. Or to put that another way: in order to be used in your script, a particular version of a module must be "certified" for use in your script.

Later on, newer versions of those modules could be installed, and other scripts on your system could use them, but your script should remain unharmed. If you then decide that you want to try to pick up newer module versions, you run through your certification process again, and only if all the tests pass does a new certificate get generated.

Obviously, challenges like where do you store the certificates, testing scripts as opposed to modules, and capturing accurate module version info during testing would need considerable thought before something like this could be implemented, but that's the basic outline.

Re:'Nudder idea...

schwern on 2007-09-19T02:10:55
Interesting idea, but it's still a "freeze the world" solution and doesn't address the "works on my machine" problem.

CPAN failures are often caused by cross-platform issues. What if the person who wrote the certificate was using a different version of Perl, operating system, file system, configuration of Perl, etc... than you?

However, the basic idea is interesting. I've pondered how one could harvest CPAN testers information to produce a map of what versions of what modules on what platforms work with others and get a "best fit" for your platform.

Re:'Nudder idea...

EvilSuggestions on 2007-09-20T01:14:18

I probably should have clarified that the idea I was proposing was for local use on a given box, not a CPAN-wide certification. So, for what I want to use it for, it doesn't necessarily feed any useful info back up the chain to the module developer (but I guess it could be extended to, in a CPAN Testers kind of way). But it does address what I consider to be an all too common situation: my script worked yesterday, it doesn't work today, what happened?

Basically, the common use case that I'm picturing (and have personally endured pain from before), is at an ISP with a shared server: you get your script working happily with the set of modules that happen to be installed on a given day, then the ISP upgrades some module to satisfy some other user, and suddenly your script quits working - or even worse, it silently starts doing the wrong thing! This is a big reason why many ISPs take a very conservative approach to installing or upgrading CPAN modules. Which has the disadvantage that you end up having to wait months or even years to pick up the bugfixes and new features that might be available in the latest versions (unless you then choose to install into your own dir, but that's a whole different kettle of fish :).

So, whereas you see this as a "freeze the world" approach, I see it as a trying to stop "spooky action at a distance" approach. There are few things more frustrating than having a script that you didn't change one bit break due to something that someone else did, somewhere else.

Another way to view it, is that we're following the test driven development model so much when writing modules, that we try to have appropriate tests in place before changing even one line of _our_ code, but after that we're willing to magically snarf in changes from anyone else's code willy-nilly. Sometimes that feels a little penny wise, pound foolish. Or maybe we just trust others more than ourselves ;).

How this (c|sh)ould work...

RGiersig on 2007-09-21T12:49:42

Sorry if this is a little bit terse, I don't have much time...

1) Module authors list dependencies of their modules in Makefile.PL/Meta.yml (they already do, don't they?)

2) PAUSE parses this info and puts it into a cross-reference DB.

3) Authors can enter compatibility info into this DB via web interface, just saying "module M works with version X of module A and version Y of module B" or "module M doesn't work with version Z of module C". (could/should this be part of rt.cpan.org? I think so...)

3a) can also be augmented by cpan-tester-results?

4) CPAN publishes this x-ref list

5) cpan(p?) shell can download this x-ref list and finally present intelligent suggestions/questions to the user:

"by updating to version Z of module C you will break working module M. really proceed? N/y"

"you want to install module M that requires module A. do you want to install the latest version of module A (which may or may not work with module M) or better install version X of module A that is known to work?"

etc. etc. ad nauseam... :-)

problem solved, now somebody go implement... *ggg*