Wrote a letter frequency analyzer, because I wanted to. (Actually, it's the first step in a not-altogether ambitious project to write something that attempts to solve simple shift ciphers automatically.) Did a baseline frequency on a dictionary file and used Storable to keep those stats. (I know, I need something better, but it was the biggest wad of English text I had available that I was reasonably sure wasn't riddled with errors or "stylistic" typos. Any suggestions for a better text are appreciated.)
Started learning how to footle with CGI::Application and HTML::Template, and I'm pleasantly surprised. One little hangup -- not being able to make clickable images -- is not such a big deal, and I suspect there's a way to do it that I haven't discovered yet.
No other news...
Re:Project Gutenberg
chaoticset on 2003-07-22T15:31:51
Any specific ones you'd recommend there? I wouldn't be against using a group of texts -- the process takes about two minutes with a 1.3 meg text file, and I want this to be really good, so I'd be willing to stick a couple hours to it. I'd like the stats to be much more precise than normal, so I'd be all for processing ten or twelve texts. It's just that the ones I happen to have on my drive at home (forgive me) aren't normal. Perl docs, while reasonably grammatic and spell-correct and all, aren't normal text. Neither are Trek fanfic (yes, I have a few, yes, I've read a few, and no, I'm not ashamed). Neither is _The Crying Of Lot 49_, unfortunately.Re:Project Gutenberg
gizmo_mathboy on 2003-07-22T21:34:37
Funny you should ask. About a month ago on the Perl Quiz of the Week mailing list, one of the quizzes concerned repeated substrings. One of the folks used the following text (extracted from an email):
-----------------------
'The Life and Opinions of Tristram Shandy, Gentleman' by Laurence Sterne, which when downloaded weighs in at around 1 Mo (as compared with 27 Ko for Dan Schmidt's US constitution).
-----------------------
The location of these (from another email):
http://www.dfan.org/constitution.txt
http://www.ibiblio.org/gutenberg/ etext97/shndy10.txt
Enjoy. However, for frequency analysis just about anything might work on Project Gutenberg. Well, maybe the chromosomes might not.