HASSAN CHOP

TorgoX on 2002-02-23T19:51:30

Dear Log,

Putting my corpus linguistics superpowers to use, I recently surveyed all the various Oriennal junk mail I've been getting lately, and decided that the most common strings to go on a killing spree for, are:

  • µÄ ("\xb5\xc4")
  • ±â ("\xb1\xe2")
  • [escape]$B ("\e\$B")
Those come up in email that's in Asian encodings even when not declared as such in the MIME headers.


Why those in particular?

pne on 2002-02-25T15:58:59

The first appears to have been chosen for "de" (roughly, "of") in Simplified Chinese (GB2312). The third looks like the JIS escape sequence which begins Japanese text in "JIS" encoding. But what is the second one? Is it supposed to be Korean "gi" in KSC 5601? And aren't you missing some typical Big5 character for Taiwanese stuff? (The second character would, in Big5, be a character which means "last day of the month or year", if I'm not mistaken, but that's probably not particularly common.)

Re:Why those in particular?

TorgoX on 2002-02-25T18:28:21

Beats me. I just ran an N-graph frequency count on some junk mail I'd gotten, and got those as being frequent.