According to my word count program, "perl" is only the 17th most frequent word in the Llama, 4th Edition.
the 6494
a 2862
to 2710
of 2235
that 1689
in 1658
you 1571
is 1518
and 1394
it 956
for 926
if 917
this 832
as 791
be 706
or 671
perl 660
We're not doing much better in the re-write of the Alpaca, where "perl" has slipped to 21st. We still have time to change that, but from the looks of it I'll have to include a couple of paragraphs of just "perl perl perl ...".
the 4977
to 1952
a 1917
of 1340
you 1331
in 1083
and 1081
is 1048
that 952
for 741
as 636
it 608
this 605
can 543
if 532
with 444
be 442
an 383
or 351
your 351
perl 321
I could have written something to go through all of our magazine columns, but then I'd have to use a module or something.
Yes but...
saorge on 2005-10-31T09:01:45
Perl is the first relevant word, because the others are "stop words". There some modules on the CPAN to work with these stop words. There are also a lot of modules to index text (even if your first intention isn't to really index text in the sense of a search engine). The occurences of term is often saved into the database because these value is used to compute the ranking of the document after a search (the most known of tese methods are TF.IDF). So, it could be simpler to query the database. perlindex is a script available on the CPAN that index the Perl documentation available on your hard disk. One option of these script ask the total number (-d threshold for the occurence) of occurences. On my box, head is the more often word used.
Re:Yes but...
brian_d_foy on 2005-10-31T16:39:40
The joke doesn't work as well when "perl" is at the top of the list for both books.
I don't think it's simpler to make a database. My script was only 10 lines, including blank ones :)
Re:Yes but...
n1vux on 2005-10-31T21:44:15
Dear brian
,
Sorry I went Pedantic on your joke along with saorge. The perl perl perl ...
graf you threatened to add to the next book is funny. I just got caught up in search boffin blather on about stopwords ... a hot-button thing.
- Bill
Stopwords and relevance
n1vux on 2005-10-31T12:49:12
Right on. Perhaps even stronger, Perl is the first substantive word, relevant or otherwise. But
stopwords is the correct searchwonk jargon for this. I susspect philologists have their own term, Larry probably would have a word that means "not a helper verb, particle, article, or pronoun". Perl is the first substantive noun, proper or common, on both lists.
Be happy!
-- Bill
former search boffin