Slap Dash FileCache

2shortplanks on 2002-06-28T11:11:20

So I need to open a vast number of files. No problem I think to myself, I'll use the FileCache module. So I run my maketry script (which spits out a file with a "#!/usr/bin/perl" line at the top, turns strict and warnings on, and loads a load of standard modules like File::Spec and IO::File and friends, before loading it up in vi) and type (well cut and paste) out the example from the perl cookbook.

$path = "bar";

use FileCache;
cacheout ($path);         # each time you use a filehandle
print $path "output";
And it falls over with the wonderful error message:
Can't use string ("bar") as a symbol ref while "strict refs" in use at 
filecache line 22.
So I spend ages trying to work out why the code isn't replacing $path with a ref to a filehandle like it should and it's leaving a bare string in there. I post to London.pm. I ask on IRC. I'm stumped...and then I look at the source. Of course! So FileCache doesn't create reference as store them in $path. It opens a filehandle in the calling package with the same name as the $path. This means that $path in the print statement is treated as a symbolic reference for a variable name and finds the right filehandle.

Okay, this is quite neat, but really really dangerous. use strict quite rightly barfs all over the place when I do this and I have to turn it off.

So it looks like I'm going to have to write my own module. Anyone got any better suggestions? Alternative modules on CPAN to do this?


DIY

jdporter on 2002-06-28T12:16:47

So, fix the module and send the patch to the author.

And if she isn't responsive, upload the new version yourself!

Re:DIY

2shortplanks on 2002-06-28T12:50:09

I can't think of a way of fixing the module that won't break backwards compatability (it's a core module too.)

Ummm

belg4mit on 2002-06-29T21:37:21

Umm this is really no different than if you took
FileCache out of the picture, string filehandles
just don't work under strict refs. They're
essentially a form of symbolic reference, if
you know what you're doing it's perfectly
reasonable to run with no strict 'refs'.

PS> I patched FileCache for perl 5.8, which still
has this "problem", but more features. If you
don't want to get all of 5.8 you can retrieve the
module from ftp://pthbb.org/pub/pm/FileCache/

Enjoy

Re:Ummm

2shortplanks on 2002-06-30T01:30:00

Excellent. This is what I love about Perl - if you waffle on about something for long enough you'll always find someone has already had a go at fixing it ;-)

As for running without use strict, I'm not sure I trust myself that much ;-) I guess I could just turn it off lexically for the tight loop, but I get really worried about flicking the safety off the loaded gun pointing at my feet.

Anyway, I've been busy since I posted my journal - I had a go at implementing my own modified version of FileCache which is designed not to be backwards compatible. This is at fc2.

Glancing at the source code of your new version I see you've added on of the features that I wanted - allowing multiple files, and moved the checking for how many files you can open into a more sensible place.

The version of FileCache still has what I would consider two bugs in it. Firstly, from what I can see it probably runs into problems when you change the cwd, returning the wrong filehandle (I used File::Cache's rel2abs function to get around that - I have no idea how efficent that is.) The second is the

  $isopen{$file} = ++$cacheout_seq;
line. To quote my own pod:
When this needs to close a filehandle it will close a third of all open filehandles based on those file handles that were requested the longest ago. To do this it maintains a list of least reacently used file. To do this it has a hash that it stores an ever incrementing value in each time cacheout is called. If you call cacheout more times that your perl can store in an IV (er, about four billion) then at some point this ordering will no longer work and random filehandles will be closed.

Oh, and if this wasn't enough to worry about I found another wishlist item while I was panicing about a run today - I want a FileCahce module that works with PerlIO::gzip too....

Re:Ummm

belg4mit on 2002-06-30T01:46:18

Not no strict, just no strict refs.

I'll consider addng a line to the CAVEATS about absolute paths. It's really somehting the user should just keep in mind, again I think of FileCache (not that I am the oringal author) as simply an abstraction over open. It has one sole purpose and is not to catch every tiny faux pas the user might commit ;-)

As for the roll-over, that wouldn't be an issue. So the file gets closed, and then might soon be reopened. If you look closely, the hash element is deleted when the file gets closed, this reduces the already infinitesimal chance that this unexpected but otherwise insignificant behavior will occur ;-) Besides which, one might point out that you run the same risk with your use of time potentially more so. And of course an additional risk of UNIX time roll over...

Finally I think some of the criticisms in your POD (those not addressed here) only pertinent to the older version.

Re:Ummm

2shortplanks on 2002-06-30T09:52:00

Not no strict, just no strict refs.
Good point. Should have thought about that a bit more.
I think of FileCache (not that I am the oringal author) as simply an abstraction over open.
Yeah, though if I use open to open a second file with the same name from a different directory then it opens a different file. FileCache will use the first file it opened.

However, I don't think I can think of any way to work around this with the current method of creating filehandles in the parent's namespace which is needed for backwards compatabilty.

I agree the best way to go would be to stick it in the CAVEATS section - user beware ;-)

Besides which, one might point out that you run the same risk with your use of time potentially more so. And of course an additional risk of UNIX time roll over...
Nah...I'm processing 5 billion plus entries (umpteen odd a second,) each of which causes a call to cacheout. Essentially the problem is that once we get to a certain point ++ doesn't work anymore, right? So all files are equally likely to get swapped out. This isn't a problem for files that can be reopended and appended to effficently, but I'm looking at doing the same for compressed files etc, where closing them is a big deal.

Of course, I'm stumped for the solution. One option might be to allow other selection criteria. Other modules I've coded that do a similar kind of thing (in a pipeline type system) used a FIFO algo based on which file was opened first. This may or may not (insert much hand waving here) be the correct way to do it.

I think FileCache is probably best just declaring this as a known problem for now (for 5.8.0 anyhow.) There's lots of complicated stuff here, and I guess the best idea would be to steal ideas from the OS people - this isn't a new problem. But that's going to take time, and a quick documentation change is probably all that's needed for the latest version.

Thanks a lot for your help

Re:Ummm

Matts on 2002-06-30T10:09:18

An IV (int) will roll over to an NV (double) at overflow point..

While ++ will not then have exactly the same semantics, it will keep the value increasing, and not have the "random" effect you describe.

Re:Ummm

2shortplanks on 2002-06-30T13:31:07

So does that mean that a ++ will potentially increase the value by more than one, until we reach +inf?

Interesting - does this mean that ++ and += 1 are not always the same operation?

Re:Ummm

richardc on 2002-07-01T13:55:30

++ and += 1 are always different ops, try perl -MO=Terse -e '$foo++; $bar += 1;' and note the ops used

++ gets folded to PP(preinc) which will do a simple ++SvIVX(sv) if it's a clean IV
If it's not it calls sv_inc(sv) which does checks for magic and, if the value is an NV, performs SvNVX(sv) += 1.0;. How long the sv retains a value like a tight-jawed integer will be dependant on the implementation of the floating point code

Re:Ummm

belg4mit on 2002-06-30T02:01:21

If it were to be released for general consumption
I think a better name would be IO::FileCache.

One could also subclass FileCache to provide the
rel2abs masking.

Re:Ummm

2shortplanks on 2002-06-30T09:54:02

I was thinking more IO::HandleCache. Strictly, I'm dealing with handles not files now - it also avoids confusion with the depricated File::Cache module.


I need to tidy everything up a bit first and maybe run some benchmarks.


Fun.

Re:Ummm

belg4mit on 2002-06-30T16:06:19

It's FileCache not File::Cache.,
and it isn't exactly caching files
either. IO::FileCache highlights
the relationship.

Re:Ummm

2shortplanks on 2002-06-30T16:14:18

See! It is confusing. I was talking about the File::Cache module that's been replaced by Cache::Cache.

Confuscioun (sic)

belg4mit on 2002-06-30T16:18:45

Umm, except that's not CORE now is?
And you did say CORE earlier, no? :-P

And what do you mean you were talking
about? You've been talking about
cacheout, that's in FileCache...