Moby Lexicon and the Challenger corpus

TorgoX on 2004-08-16T00:26:57

Dear Log,

The Moby Lexicon is a happy time. It's a bunch of public-domain wordlists. And here's a link to them.

Also, one of the few good things I got out of my year at Northwestern's department of Linguistics is awareness of what everyone called "the Challenger corpus". This is the transcripts from the government hearings to figure out why the space shuttle Challenger done sploded. It's handy because it's a pretty large corpus of spoken English, tidily transcribed.

And here it is as a 1.1MB zip file, containing the contents as simple (greppable) HTML files, which I originally got off of a NASA server.

I remember writing a paper in grad school on uses of the word "even", and I remember that this quote was somehow important to whatever (long forgotten) point I was trying to make in the paper:

MR. JONES: Well, I think it is, and of particular concern, and I happen to have had a military background as General Kutyna does. The rules of engagement are horrifying. An airplane comes toward the launchpad and you get terribly concerned. At what point do you shoot him down? Only to find that it was a poor young man from Memphis on his way home from the Bahamas, and was curious. I mean, he shouldn't have been in that air space.

DR. COVERT: Or lost, even.


WESUN Puzzlemaster - Uses for the Lexicon

n1vux on 2004-08-18T16:02:29

I gave a talk at the Boston.pm on using Perl and the Lexicons to cheat at the NPR Weekend Edition Sunday Puzzle, as an alternative to Perl Golf.

The slides (corrected) are at the following spider-resistant url (paste it back together): http: / / world.std.com/ ~wdr/ x/wesun/WESun.html

I'm not the only one this demented.

Bill

Re:WESUN Puzzlemaster - Uses for the Lexicon

n1vux on 2004-08-18T16:04:12

Ugh, hit [Submit] instead of [preview] the last time ... the PerlMonks Url is corrected

thank you vm

tinman on 2004-08-18T18:16:23

always on the lookout for those elusive tidily transcribed English transcripts for sample runs.