To say that Text::CSV_XS has trouble with Mac line-endings (\015) is somewhat of an understatement. Not only will it not parse a file that uses them to end lines, it won't even allow them inside a field in binary-mode. Binary-mode is advertised as allowing any character as long as it's in a quoted string, so this is clearly (in my opinion) a bug.
I dug into the code intending to solve both problems, but only managed to fix the latter. Actually supporting \015 as a line-ending character looks like it would be hard. For my purposes it wouldn't help unless it was automatic - if I have to tell Text::CSV_XS that a file has Mac line-endings then I might as well just translate them. That's the way Unix and Windows line-endings work now - you don't have to tell the module what to expect and you can even mix them in a single file. The way it accomplishes this feat doesn't extend well though, at least as far as I can tell.
In any case, here's the bug fix: mac.diff. After you apply it you should find that stray \015 characters work just fine in binary mode. I also sent it to the maintainer, but since the module hasn't had a release in 5 years I'm not exactly holding my breath!
I came very close to going on an optimization mission while I was in the code. The state machine looks like it could benefit from some tweaking and the way lines are read looks like it could be improved. This would be pretty foolish though - Text::CSV_XS is already so fast that I've never seen it show up in a profile on a serious app. Usually I'm reading CSVs so I can load data into a database via DBI, by which point Text::CSV_XS is unlikely to be a bottleneck.
-sam
Re:New maintainer?
samtregar on 2006-10-30T03:28:55
Thanks, I'll send the patch his way.-sam
Re:PAUSE admins can hekp
samtregar on 2006-10-31T03:54:35
So noted. I think he's in - it's listed on his CPAN author page as a module registered to him. I haven't heard back from him yet, but I imagine I will soon.-sam
Hi Sam,
I know I am late to the party, but try installing PerlIO::eol and give the following a try:
my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
open my $io, "<:raw:eol(Native)", $filename or die "$filename: $!";
while (my $row = $csv->getline($io)) {
...
}
Text::CSV_XS can use an IO::Handle object, and the IO::Handle object can convert line endings on the fly for you using PerlIO::eol. This also has the added benefit of handling fields with embedded newlines.