I'm trying to write a Bayesian analysis tool for trying to guess whether a given card is worth playing in tournament play or not.
Like so many other things, it's required several different stepping stones, and so far up to a few days ago things were progressing at a rather rapid clip -- I'd pounded out a converter to change the WotC formatted file into YAML, a scraper to compile a list of all the cards in the top 8 Type 2 decks, a splitter that puts all the non-suck cards into a pile called 'good' and all the suck cards into a pile called 'suck'.
It was then that I noticed that my format wasn't coming out right. I'd written the WotC converter wrong, and as a result some of the values were ending up in the wrong hash elements. Sigh.
I'm trying to rewrite it properly, but it's been a little slow.
The data set for this is so large and varied that I'm not sure I can produce a small case to test with, but maybe I should try anyway.