Hey internet, ⠸⠙⠱ ⠝⠉⠁⠈ ⠅⠝⠁⠕⠕⠉⠃ ⠝⠆⠏⠍⠞?
A year or more ago I was fixing work's web site to handle Unicode as entered by users into fields. We don't use CGI.pm because....? Well ok, we just don't. It also doesn't handle Unicode properly either. Or at least almost no version. Huh.
If a user types "Coatıcook" you'll probably get the dotless "i" character as either %C4%B1 or %u131 but CGI.pm as supplied by perl almost most of the time won't do something reasonable.
for v in 5.11.3 5.10.1 5.10.0 5.8.9 5.6.2;do /opt/perl-$v-64-thr-dbg/bin/perl\ -le ' use CGI;
my $input = "a=%u2021"; my $expect = "\x{2021}"; my $got = CGI->new( $input )->param( "a" );
print $expect eq $got ? "ok $] $CGI::VERSION" : "not ok $] $CGI::VERSION" '; done
CGI.pm decodes the non-standard (and invalid according to RFC 3986) pct escape into a UTF-8 octet string, but it doesn't decode it into perl unicode string. I think the current behavior is desirable since the data can contain any octets in any encoding.
--
chansen
> %u131
What sort of encoding is that? I mean, I can see it's the Unicode codepoint preceded by %u, but which standard backs this? I've never encountered this before.
Here's my take on it:
use CGI qw();
use Encode qw(decode_utf8);
my $input = 'a=%C4%B1';
my $expect = "\x{131}";
my $got = decode_utf8(CGI->new($input)->param('a'));
# as per best practice http://search.cpan.org/perldoc?CGI#-utf8
use Devel::Peek qw(Dump); Dump $expect; Dump $got;
print $expect eq $got
? "ok $] $CGI::VERSION"
: "not ok $] $CGI::VERSION"
__DATA__
SV = PV(0x88bc40) at 0x8c12f8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x8aaad0 "\304\261"\0 [UTF8 "\x{131}"]
CUR = 2
LEN = 8
SV = PV(0xac9e60) at 0x8c13e8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0xad5740 "\304\261"\0 [UTF8 "\x{131}"]
CUR = 2
LEN = 8
ok 5.010001 3.48
Re: Unicode URLs, wtf?
Hansen on 2010-01-07T12:50:53
It usually comes from broken javascript applications that uses escape() instead of encodeURI()
escape("\u263A") -> %u263A
encodeURI("\u263A") -> %E2%98%BA--
chansen
Did you try using use 'CGI qw/
Re::utf-8 ?
jjore on 2010-01-09T05:14:19
Nope. I'd never noticed the option. My bad!