Mac line-endings and Text::CSV_XS

samtregar on 2006-10-29T22:44:29

To say that Text::CSV_XS has trouble with Mac line-endings (\015) is somewhat of an understatement. Not only will it not parse a file that uses them to end lines, it won't even allow them inside a field in binary-mode. Binary-mode is advertised as allowing any character as long as it's in a quoted string, so this is clearly (in my opinion) a bug.

I dug into the code intending to solve both problems, but only managed to fix the latter. Actually supporting \015 as a line-ending character looks like it would be hard. For my purposes it wouldn't help unless it was automatic - if I have to tell Text::CSV_XS that a file has Mac line-endings then I might as well just translate them. That's the way Unix and Windows line-endings work now - you don't have to tell the module what to expect and you can even mix them in a single file. The way it accomplishes this feat doesn't extend well though, at least as far as I can tell.

In any case, here's the bug fix: mac.diff. After you apply it you should find that stray \015 characters work just fine in binary mode. I also sent it to the maintainer, but since the module hasn't had a release in 5 years I'm not exactly holding my breath!

I came very close to going on an optimization mission while I was in the code. The state machine looks like it could benefit from some tweaking and the way lines are read looks like it could be improved. This would be pretty foolish though - Text::CSV_XS is already so fast that I've never seen it show up in a profile on a serious app. Usually I'm reading CSVs so I can load data into a database via DBI, by which point Text::CSV_XS is unlikely to be a bottleneck.

-sam


New maintainer?

bart on 2006-10-30T03:20:35

I thought I heard Jeff Zucker, AKA jZed on Perlmonks, was set on taking over this module but I think "he didn't find it worthwhile already to release a new version". The last is paraphrased from something he told me in the Chatterbox. I'm not sure any more that this was the module he was talking about, but I think it was.

So... Ask him?

Re:New maintainer?

samtregar on 2006-10-30T03:28:55

Thanks, I'll send the patch his way.

-sam

PAUSE admins can hekp

brian_d_foy on 2006-10-31T01:13:52

If Jeff doesn't want to take over the module, I can apply the patch and release a new version. I don't want to maintain it, but I do help modules find new homes. :)

Re:PAUSE admins can hekp

samtregar on 2006-10-31T03:54:35

So noted. I think he's in - it's listed on his CPAN author page as a module registered to him. I haven't heard back from him yet, but I imagine I will soon.

-sam

Use PerlIO to get around the problem

cees on 2007-07-05T06:12:19

Hi Sam,

I know I am late to the party, but try installing PerlIO::eol and give the following a try:

my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
open my $io, "<:raw:eol(Native)", $filename or die "$filename: $!";
while (my $row = $csv->getline($io)) {
  ...
}

Text::CSV_XS can use an IO::Handle object, and the IO::Handle object can convert line endings on the fly for you using PerlIO::eol. This also has the added benefit of handling fields with embedded newlines.