Mirror::YAML released: Smart mirror discover and rediscover

Alias on 2007-01-20T17:03:46

One of the more annoying problems with the CPAN client is there's no clean way for the client to auto-discover the best CPAN mirror to use.

This leaves us with the kludge of the continent -> country -> servers method.

Frankly, this sucks.

Not only is there no way of ignoring servers that have gone stale, but there's no way of knowing how fast they are.

If a mirror in the US is faster for me than a mirror in Australia, I should be using the one in the US. Even though the Australian server may be geographically server, that by no means implies that it is networking-wise closer (especially in this country).

So what we really need a relatively efficient and reliable method for the CPAN client to automatically select the mirror to use itself, to know when a mirror has gone stale and switch to new mirrors, and (in extreme situations) to rediscover the master server (and get a new mirror list) even when the local copy of the master/mirror data is extremely out of date.

So I've had a first shot at solving this problem.

This first very skethy implementation of Mirror::YAML is tuned for the 10-15 mirrors of the JSAN, rather than for the 300 mirrors of the CPAN.

But since I have the luxury of using the JSAN as a testbed for systematic improvements I want to later see appear in the CPAN, I plan to make the concept work there first.

None of this is new. It's a similar collection of metadata to what currently exists spread out a bit more on the CPAN metadata.

It's just that in this case the metadata is all collected into one extremely small single-packet-to-transfer mirror.yml file. This file sits in the root of the repository, and with one single request can verify the mirror URI, check the staleness of the repository, find all the mirrors, benchmark the transfer speed (lag, essentially) of the repository and so on.

By pulling this one small file from each mirror, it's possible to intelligently select the best mirrors based on how up to date they are, and how fast files are actually sent from the mirror.

And as mirrors come and go, this should track them properly.

There's obviously tuning to go, and there are a few caveats I'm yet to remove, and of course integration into the JSAN client proper still needs to be done, but I have high hopes that this will be useful, and something we can move into CPAN down the track (or any repository really).