From "perldoc perlunicode":
Code Points 1st Byte 2nd Byte 3rd Byte 4th ByteAnd the equivalent regex:
U+0000..U+007F 00..7F U+0080..U+07FF C2..DF 80..BF U+0800..U+0FFF E0 A0..BF 80..BF U+1000..U+CFFF E1..EC 80..BF 80..BF U+D000..U+D7FF ED 80..9F 80..BF U+D800..U+DFFF ******* ill-formed ******* U+E000..U+FFFF EE..EF 80..BF 80..BF U+10000..U+3FFFF F0 90..BF 80..BF 80..BF U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
qr{ (?: [\x00-\x7f] # U+0000 .. U+007F | [\xc2-\xdf] [\x80-\xbf] # U+0080 .. U+07FF | \xe0 [\xa0-\xbf] [\x80-\xbf] # U+0800 .. U+0FFF | [\xe1-\xec] [\x80-\xbf] [\x80-\xbf] # U+1000 .. U+CFFF | \xed [\x80-\x9f] [\x80-\xbf] # U+D000 .. U+D7FF | [\xee-\xef] [\x80-\xbf] [\x80-\xbf] # U+E000 .. U+FFFF | \xf0 [\x90-\xbf] [\x80-\xbf] [\x80-\xbf] # U+10000 .. U+3FFFF | [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf] # U+40000 .. U+FFFFF | \xf4 [\x80-\x8f] [\x80-\xbf] [\x80-\xbf] # U+100000 .. U+10FFFF ) }x;This has proven useful as I search for errant Latin-1 characters embedded in some files.