BackPAN Indexer has components and another interface

brian_d_foy on 2008-09-15T05:36:23

I've been doing more work on my BackPAN Indexer, but the sort that doesn't do any indexing. What I really need to do is be home for more than one day at a time so I can get everything set up on a computer that I don't have to use. Indexing tens of thousands of distributions takes over my MacBook's poor disk drive, and then I can't do much.

So, in the meantime, I'm cleaning up the structure of the code so I can make things pluggable. There are several components that you can plug in: A Queue class that makes the list of things to process, a Worker class that defines the work to do on each thing in the queue, a Reporter class to store the Worker's results, a Dispatcher to hand out work to the Workers, and an Interface to show the live run information.

It's turned out to be a really nice design (that's getting better). I know I have a good design when the things I want to do next naturally fall out of the design. For instance, I wanted to test the Tk Interface and move things around, but I didn't want to actually process anything. I started working on a test script to mock everything, then I realized I didn't really need mocks because I could plug in null classes to handle the Queue, which would be empty or not, and the Worker, which would just do nothing.

So, as I've been moving around the country, I've been working on coding at least two examples of each component. That's a really nice way to design things. A design that works for one thing might not work for another thing. Forcing myself to come up with two non-trivial examples shakes out some of that stuff. This stuff is going to be inportant if people want to run the indexer themselves, or if someone ever sets up a CPAN Testers style group to do it (i.e. the dispatcher can distribute work around the world).

This week's work has been to create a Curses interface (because if I can't run it in a terminal, it's not Scottish), and something I'm calling the Test Census, which just counts the Test:: modules instead of doing the full indexing (and storing the large amounts of data I collect). If you're looking in the git repo, check out the test_counter branch. There is something a bit broken with the dispatching or the reporting somehow, but I'll think about that next week.

I think the full index of BackPAN will start next week, and I expect it to run for about a week. I figure the error rate will be about 5%, like it was for MiniCPAN, and I'll then spend some time improving the indexer. While the indexer is running, I'll work on the other bits.