Help test WWW::Mechanize and its encoding fixes

petdance on 2007-05-22T19:45:46

Kevin Falcone and I have been working on encoding problems with Mechanize. As many people as can test the darn thing would be appreciated.

Also, suggestions on getting rid of the warnings in the test suite are welcome as well.

Thanks,
xoxo,
Andy
(and Kevin)

Download at $CPAN/authors/id/P/PE/PETDANCE/WWW-Mechanize-1.29_01.tar.gz

Here's what goes wrong: t/live/wikipedia.........ok 5/15Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. t/live/wikipedia.........ok 10/15Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm line 83. t/live/wikipedia.........ok


relevant links

petdance on 2007-05-22T21:03:05

Relevant links:

My solution

ChrisDolan on 2007-05-23T04:08:58

Here's the solution I employ in my own hacked copy of Test::WWW::Mechanize::Catalyst, which I submitted to RT.

       if ($response->header('Content-Type') &&
           $response->header('Content-Type') =~ m/charset=(\S+)/xms) {
          my $encoding = $1;
          $response->content(Encode::decode($encoding, $response->content));
       }
So far, that's worked great for me, but then I have the luxury of working almost exclusively in UTF-8 with trustworthy content.

I tried 1.29_01 and got this failure:

t/local/back.............NOK 28/38
#   Failed test '404 check'
#   at t/local/back.t line 149.
#          got: '500'
#     expected: '404'
[... snip ...]
t/local/overload.........skipped
        all skipped: Mysteriously stopped passing, and I don't know why.
[... snip ...]
Failed Test    Stat Wstat Total Fail  List of Failed
------------------------------------------------------------------------ -------
t/local/back.t    1   256    38    1  28
1 test skipped.
but then I do run pretty aggressive firewall settings on my Mac, so it might be me...

When adding 1.29_01 to my @INC, all tests continue to pass for the Catalyst app I'm currently working on, so that's a thumbs up from me.

As for the warning, I've posted a patch which works for the wikipedia test, but breaks the o-umlaut test in t/local/follow.t

Re:My solution

grantm on 2007-05-23T10:10:55

Wouldn't it be better to just use the decoded_content method of the HTTP::Message (Response) object? Or is there some problem with that?

Re:My solution

ChrisDolan on 2007-05-23T12:36:35

Oh, uh, yeah... The significant problem with the function is my ignorance of its existence. :-/

Re:My solution

jibsheet on 2007-05-23T17:04:44

unfortunately, that makes t/local/nonascii.t break, and that has been a test that worked with mech for years. (it is derived from a number of tests in the RT testsuite)