The right and wrong of perlfaq

brian_d_foy on 2004-10-30T22:34:38

I had to edit perlfaq6's "I put a regular expression into $/ but it didn't work. What's wrong?" last night, so I did what I usually did: use a word from the question title to jump to the right place in the document. I figured the right word would be "wrong", but as I jumped from instance of "wrong" to "wrong", I thought there were an awfully lot of "wrong"s. I hadn't really thought about it before: what's the balance of "wrong" and "right" in the perlfaq? Who's winning?

#!/bin/sh

cd /Users/brian/Dev/perlfaq

echo " doc wrong right" echo "----------------------------------"

for doc in perlfaq[123456789].pod; do wrong=`grep -c -i wrong $doc` right=`grep -c -i right $doc` printf '%-12s %8d %8d\n' $doc $wrong $right done


Even without tallying the totals, I see the "right" wins out over "wrong", although perlfaq6, the doc I was editing, does have the highest number of "wrong"s.

    doc           wrong    right
----------------------------------
perlfaq1.pod        0        4
perlfaq2.pod        0        4
perlfaq3.pod        1        8
perlfaq4.pod        4       12
perlfaq5.pod        5        3
perlfaq6.pod        6        6
perlfaq7.pod        4       11
perlfaq8.pod        2        5
perlfaq9.pod        1        3


Curiously, the distribution of "wrongs" is a bell curve, although not quite symmetrical.

|
|           *
|         * *
|       * * * *
|       * * * *
|       * * * * *
|     * * * * * * *
0+------------------
  1 2 3 4 5 6 7 8 9


This gives me a chance to play with R, a statisical package. I at first thought "R" must be a really bad name because it must be hard to find in Google, but it's the second result (the first is the stock quote for "R" (Ryder System Inc)). "R" is slick: I wish I had this when I was doing chemistry.

albook_brian[791]$ R

R : Copyright 2004, The R Foundation for Statistical Computing Version 2.0.0 (2004-10-04), ISBN 3-900051-07-0

> freq <- c( 1,4,5,6,4,2,1 ) > mean(freq) [1] 3.285714 > median(freq) [1] 4 > var(freq) [1] 3.904762 > sd(freq) [1] 1.976047


Still, "wrong" might be right word even if it seemed to show up a lot. I modified my shell script to check the other words too, and on a second revision, check some words with their juxtaposed punctuation, thinking that combination would be even less frequent.

#!/bin/sh

cd /Users/brian/Dev/perlfaq

doc=perlfaq6.pod

echo " doc word count" echo "-------------------------------------"

for word in "I" "put" "a" "regular" "expression" "into" "$/" \ "but" "it" "didn't" "work" "work." "What's" "wrong" "wrong?" do count=`grep -i -c $word $doc` printf '%-15s %-15s %4d\n' $doc $word $count done


If I wanted to jump right to the question, "didn't" is the word to choose, although "wrong?" gets me there in at most two hops. I shouldn't choose "I", "it", or "a". Their numbers are low because the -c switch only counts matching lines, remember. Curiously, "work" seems to always show up next to a full stop.

     doc        word            count
-------------------------------------
perlfaq6.pod    I                456
perlfaq6.pod    put                7
perlfaq6.pod    a                437
perlfaq6.pod    regular           27
perlfaq6.pod    expression        26
perlfaq6.pod    into               6
perlfaq6.pod    $/                12
perlfaq6.pod    but               20
perlfaq6.pod    it               120
perlfaq6.pod    didn't             1
perlfaq6.pod    work               8
perlfaq6.pod    work.              8
perlfaq6.pod    What's             4
perlfaq6.pod    wrong              6
perlfaq6.pod    wrong?             2


"/Users" dir?

offerk on 2004-12-02T13:59:09

Not exactly a standard directory name, is it?
Just curious, do you use Gobo Linux (http://www.gobolinux.org/) or is there another reason for such a name?

--
Offer Kaye

Re:"/Users" dir?

brian_d_foy on 2004-12-02T18:58:30

It depends on what you think "standard" is. On MacOS X it's pretty standard.