Thank you CPAN

btilly on 2008-05-24T05:02:02

Everyone's favorite reason to use Perl just came through for me again. Let me give some background.

At $work I am in charge of reporting. Our business has something called events. Different events are treated differently, and sometimes we make money from them and sometimes we don't. Sometimes we make more money from them, and sometimes less. One of the things we want reports for is to figure out what factors make events make more money. (Obviously because we want to make more events make money for us!)

Now I have a fairly flexible report that we'll call revenue per event, because that is its name. It allows us to see revenue per event at various ages broken out by various combinations of factors. Such as whether passwords are needed to login, whether gift certificates were added, whether specific promotions were run, that kind of thing. This is a very useful report. We can see that, for instance, running a promotion brings in money (duh) and tell about how much more money events with that promotion make.

But we have a problem. You see, we have a pretty good idea what factors make events make more money. (Unfortunately we don't control how the event is set up, we're handling them as a service for our direct customers.) So when people take our advice they do several things that are good. How can we tell how good each individual thing they are doing is?

Hmm..let's see. Sounds like we need to do some sort of multi-variable linear regression. Why don't we look on CPAN and find Statistics::Regression and see if that works? Oh look, it does! I've used it in the past. But look, it added a method called standarderrors, what is that? (Run some tests, make hypothesis, email author, get confirmation.) Goody, we not only can find a linear regression, but we can get estimates of how much random noise in the data might be throwing off the coefficients! I had been bothered by the fact that people tend to take the numbers I produce as gospel with no eye to whether there was any statistical validity to the numbers.

Hrm, do I trust the module? I take a legitimate pride in knowing a fair amount of math, but I know I don't know how to do this. Well look at the source code and..holy crap! OK, for me to learn this to my satisfaction would take a long time. What programmer contributed it and do I trust it? Rummage around and do some research...oh, he's a professor at Brown. He teaches courses on multi-variable statistics. I think I can trust that he knows his stuff! :-)

Getting my report to do multi-variable regression and display it the way I wanted still wasn't easy. But at least it wasn't easy for programming reasons (some day I need to write something explaining why it may be important to put a condition in an ON clause rather than a WHERE clause - I spent an hour tracking down the resulting bug), and not because I couldn't figure out the math.

Where else but CPAN would you expect to find something like that?


where else but CPAN

hdp on 2008-05-24T12:55:25

I wouldn't be surprised for such a thing to exist outside of CPAN.

I might be a little surprised at finding them. I'd expect the Python version to be called something like pyCruncher, and the Ruby version would be maybe "Gazelle".

CRAN: R's family of Internet sites for packages

mr_bean on 2008-05-25T13:58:00

According to http://www.r-project.org/about.html, R, the language and environment for statistical computing and graphics, has packages covering a very wide range of modern statistics on CRAN.