During the YAPC::Asia hackathon day 2, we discovered a kanji that we could use for Moose.pm: 箆. It even looks a bit like some thing with antlers. Here is the JEDict definition:
箆 [へら: HERA] spatula
箆鹿 [へらじか: HERAJIKA] (uk) moose, elk
So a moose is an animal with a spatula on its head? Larry said that 箆 could also mean "comb", so a moose is an animal with a comb on its head. But I digress.
I've made a simple proof-of-concept distribution. The main module's code basically is:
use utf8;
package 箆;
1;
'perl Makefile.PL', 'make', 'make test' and 'make dist' all worked well. It produced a tarball: 箆.tar.gz. So far, so good.
Then I've tried to upload the tarball to cpan:
$ cpan-upload-http 箆.tar.gz
That also gave no error message. So I waited for the emails from PAUSE. Here is the first one:
> Subject: Notification from PAUSE
>
> MARCEL (Marcel Grünauer == hanekomu (跳ね込む)) visited the PAUSE and
> requested an upload into his/her directory. The request used the
> following parameters:
>
> pause99_add_uri_upload ààààààààààà[çî-0.01.tar.gz]
> SUBMIT_pause99_add_uri_httpupload [ Upload this file from my disk ]
> pause99_add_uri_httpupload ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ [-0.01.tar.gz]
>
> The request is now entered into the database where the PAUSE daemon will
> pick it up as soon as possible (usually 1-2 minutes).
>
> During upload you can watch the logfile in
> https://pause.perl.org/pause/authenquery?ACTION=tail_logfile&pause99_tail_logfile_1=5000.
>
> You'll be notified as soon as the upload has succeeded, and if the
> uploaded package contains modules, you'll get another notification from
> the indexer a little later (usually within 1 hour).
>
>
> Thanks for your contribution,
> --
> The PAUSE
Due to the good fortune of having kanji and kana in my CPAN display name (跳ね込む), I was sure that the email had the right encoding. But the tarball filename it returned was wrong: "çî-0.01.tar.gz". Well, let's see. Maybe it's just a problem with generating the email. Wait and see... Here is the second email:
> Subject: CPAN Upload: M/MA/MARCEL/-0.01.tar.gz
>
> The uploaded file
>
> ÃÂ ÃÂ ÃÂ -0.01.tar.gz
>
> has entered CPAN as
>
> ÃÂ file: $CPAN/authors/id/M/MA/MARCEL/-0.01.tar.gz
> ÃÂ size: 24392 bytes
> ÃÂ ÃÂ md5: 39ebd0b5ab4bd9acdb2a80c1c4a733dd
>
> No action is required on your part
> Request entered by: MARCEL (Marcel Grünauer == hanekomu (跳ね込む))
> Request entered on: Sun, 18 May 2008 11:00:42 GMT
> Request completed: ÃÂ Sun, 18 May 2008 11:00:52 GMT
>
> Thanks,
> --
> paused, v996
Hm, doesn't look good either. Well, wait for the indexer report... Here it is:
> Subject: Failed: PAUSE indexer report MARCEL/-0.01.tar.gz
>
> The following report has been written by the PAUSE namespace indexer.
> Please contact modules@perl.org if there are any open questions.
> ÃÂ Id: mldistwatch 1001 2008-05-15 05:34:01Z k
>
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ User: MARCEL (Marcel Gruenauer == hanekomu)
> ÃÂ Distribution file: -0.01.tar.gz
> ÃÂ ÃÂ ÃÂ Number of files: 21
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ *.pm files: 14
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ README: 箆-0.01/README
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ META.yml: 箆-0.01/META.yml
> ÃÂ Timestamp of file: Sun May 18 11:00:51 2008 UTC
> ÃÂ ÃÂ Time of this run: Sun May 18 11:02:23 2008 UTC
>
> No package statements could be
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ found in the distro (maybe a script or
> ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ documentation distribution?)
>
> __END__
Ah, so it could actually untar it because it found the README and META.yml files. But it seems it uses the wrong regex to find the package statements. Maybe it needs to read the file as utf8...
Just to be sure it wasn't a problem with cpan-upload-http, I also uploaded the file from the PAUSE web interface directly. Here is the response:
> The Perl Authors Upload Server
>
> Upload a file to CPAN
> Add a file for MARCEL
> File successfully copied to '/home/ftp/incoming/-0.01.tar.gz'
> Your filename has been altered as it contained characters besides the class [A-Za-z0-9_\-\.\@\+].
> DEBUG: your filename['çî-0.01.tar.gz'] corrected filename['-0.01.tar.gz'].
So this doesn't work either.
Dear CPAN maintainers, please fix it.
Fortunately CPAN filenames are 7 bit and only a few of the non \w
filenames are allowed. As much I'd like to change that as much CPAN has to be the advocate of the limitations that the last poor soul in the chain has to fight with who wants to use CPAN.
I'm sure you know how to give the tarball a different name. I'm not so sure that perl is able to deal with UTF-8 module names. So please bring your first UTF-8 module into the core. I'm looking forward to see how it fares.
Re:unidecode, transcribe, timtowdti
rafael on 2008-05-19T06:17:40
Regarding UTF8 variable names, the core doesn't support them very well. See this item in perltodo.pod :
=head2 Properly Unicode safe tokeniser and pads.
The tokeniser isn't actually very UTF-8 clean. C is a hack -
variable names are stored in stashes as raw bytes, without the utf-8 flag
set. The pad API only takes a C pointer, so that's all bytes too. The
tokeniser ignores the UTF-8-ness of C, or any SVs returned from
source filters. All this could be fixed.
Re:unidecode, transcribe, timtowdti
schwigon on 2008-05-26T07:48:38
But package names work quite ok. I did a similar experiment at nearly the same time with Acme::Ünicöde. Locally on disc it just works fine. Once uploaded it is even installable via CPAN.pm by the dist name (eg. S/SC/SCHWIGON/acme-unicode/Acme-nicde-0.02.tar.gz). The filename problem is solveable anyhow during/after "make dist".
The only remaining problem is getting the PAUSE indexer to use the META.yml instead trying his luck on the files and to generate that META.yml with some more explicit information than usual.
Probably.