From "perldoc perlunicode":
Code Points 1st Byte 2nd Byte 3rd Byte 4th ByteAnd the equivalent regex:
U+0000..U+007F 00..7F U+0080..U+07FF C2..DF 80..BF U+0800..U+0FFF E0 A0..BF 80..BF U+1000..U+CFFF E1..EC 80..BF 80..BF U+D000..U+D7FF ED 80..9F 80..BF U+D800..U+DFFF ******* ill-formed ******* U+E000..U+FFFF EE..EF 80..BF 80..BF U+10000..U+3FFFF F0 90..BF 80..BF 80..BF U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
qr{
(?:
[\x00-\x7f] # U+0000 .. U+007F
|
[\xc2-\xdf] [\x80-\xbf] # U+0080 .. U+07FF
|
\xe0 [\xa0-\xbf] [\x80-\xbf] # U+0800 .. U+0FFF
|
[\xe1-\xec] [\x80-\xbf] [\x80-\xbf] # U+1000 .. U+CFFF
|
\xed [\x80-\x9f] [\x80-\xbf] # U+D000 .. U+D7FF
|
[\xee-\xef] [\x80-\xbf] [\x80-\xbf] # U+E000 .. U+FFFF
|
\xf0 [\x90-\xbf] [\x80-\xbf] [\x80-\xbf] # U+10000 .. U+3FFFF
|
[\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf] # U+40000 .. U+FFFFF
|
\xf4 [\x80-\x8f] [\x80-\xbf] [\x80-\xbf] # U+100000 .. U+10FFFF
)
}x;
This has proven useful as I search for errant Latin-1 characters embedded in some files.