I have since chosen to come at this from a different direction, eliminating the problem, but I'm still confused by the behavior I saw:
I am reading in a unicode text file, tab delimited. Each line is split on tab into an array and then some array elements are processed individually. One of them may contain katakana or hiragana text (on or kun readings of kanji). For my first whack at a database import, I just passed this field through untouched.
Later I added this code, which checks to see what script the field is in:
@readings = split(/,/,$word[4]); foreach $chunk (@readings) { $chunk =~ s/\s*//g; if ($chunk =~ /\p{InKatakana}/) { $on .= $chunk . ","; } else { $kun .= $chunk . ","; } }