Note to self on Tie::File

jonasbn on 2004-08-13T19:23:31

Do NOT use it on large files when doing multiple replacements of file contents, do copying of content instead.

hmmm

jhi on 2004-08-15T19:30:45

> Do NOT use it on large files when doing multiple replacements of file contents, do copying of content instead.

Given Tie::File's original purpose I'd say that what you describe above sounds almost like a bug. I say 'almost' since you didn't really explain what was the problem...

Re:hmmm

jonasbn on 2004-08-16T07:10:24
Ok, this is the script:
#!/usr/bin/perl -w use strict; use Tie::File; my @array; tie @array, 'Tie::File', $ARGV[0] or die "Unable to tie file: $ARGV[0] - $!"; foreach(@array) { print STDERR "Processing: $_\n"; if (m/^<Seg L=/) { s/"?,\s*$//; s/>"/>/; print STDERR "Processed: $_\n"; } } untie(@array); exit(0);
I need to proces a lot of line, which look like this:
<Seg L=EN-GB>"China PR",
Resulting in this:
<Seg L=EN-GB>China PR
The file in about 5 megabytes in size... and a rough estimate says 1000 lines in 5 minutes and I have 127381 lines...

Re:hmmm

jhi on 2004-08-16T08:49:29
Firstly, you could redo this
if (m/^<Seg L=/) { s/"?,\s*$//; s/>"/>/; print STDERR "Processed: $_\n"; }
as
s/^(<Seg L=.+>)"(.+)",\s*$/$1$2/ && print ...
to replace one m and two s's with one s.
Secondly, doing at least one STDERR print for every line, and two for matching lines, is probably not cheap.
Thirdly, if you can do without those prints you could just use perl -pi.bak -e 's/.../.../'.
Fourthly, there are also various parameters for Tie::File that could affect the result quite a bit, see e.g. the deferring options.
Fifthly, it would still be nice to know why Tie::File is not fast enough. If you could profile this (e.g. with -d:DProf and dprofpp), that would be nice.

Re:hmmm

jonasbn on 2004-08-16T13:22:41
Thanks for the advice,

I think I will attemp to time the various versions.

I will profile the script aswell,

Again thanks :)