What is the sound of one head thudding?

schwern on 2005-10-14T02:41:03

Is it not the sound of frustration?

I'm trying to write a module that given a filehandle which is zero or more of zipped, gzipped or pgp (and possibly more in the future) encoded it unpacks the contents of the filehandle. The trick is everything has to be done with filehandles. Filehandles in and filehandles out. At no time should the files exist on disk (security reasons, very sensitive data). The moral equivalent to:

    system("gunzip foo.pgp.gz | ungpg | myprogram");


Except its more complicated because it involves zip programs with multiple files. Ultimately, I want to be able to write this:

    # Typically $input_fh comes from Net::SFTP.
    my %files = Extractor->extract($input_fh);

while( my($filename, $fh) = each %files) { _filter_out_sensitive_information_and_write($fh, $filename); }


For example, if $input_fh is to a zip file which contains "foo" and "bar" %files will contain the keys "foo" and "bar" and the values will be open filehandles to their contents. The filter reads and munges the contents of the filehandles and outputs to the associated file.

What's kicking my ass is that many of the existing unarchiving modules don't return filehandles. Archive::Zip and Compress::Zlib both return objects. I'm using the less than desirable GnuPG::Interfaces instead of Crypt::GPG simply because Interfaces uses filehandles. I tried tying the objects to a handle but the tying seems to get lost once the handle is copied. I played a bit with wrapping them inside IO::Handle glob objects but that's just too weird even for me.

My final thought is that since I control the filters I can just write an adaptor around each nearly-a-filehandle object which Archive::Zip or Compress::Zlib returns and have the filters use that, but it seems like a lot of work when filehandles would do everything I want.

Am I missing something?

UPDATE:

Yes I am. And it is this.

    assert tied $fh;


That returns false. THIS returns true.

    assert tied *$fh;


*sigh* I was doing the former. It made me think the returned filehandle was not tied. Trouble is, the internal testing system we use is XUnit style. It dies on failure. So I never saw the next test which checks the contents of the filehandle.

Reason #139892 why aborting on failure is EVIL!


An other angle

mir on 2005-10-14T15:34:03

At no time should the files exist on disk (security reasons, very sensitive data).

What about using an encrypted filesystem? Or "plain" encrypted files for every step?

Re:An other angle

schwern on 2005-10-14T23:08:28

What about using an encrypted filesystem?
I don't have that level of control over the system. One trick I can use, and is being used elsewhere in the code, is unlinked temp files (open a temp file and unlink it). The file is on disk but no other process can see or read it except my open filehandle. Once my filehandle goes away the file goes away (yes, it might hang around on the disk for a while before its wiped. This is ok.). It won't get backed up.

But this involves each extraction step to read the file in and write it back out. And then the filters have to read it in and write it out. Seems like a lot of unnecessary reads and writes. Then again, the extracting and filtering is not the bottleneck in this process. Its the processing done on the resulting decoded and filtered file which is.
Or "plain" encrypted files for every step?
What does that mean?

Tied filehandles

bart on 2005-10-16T02:07:44

Surely you can make it work using tied filehandles. It won't be dead easy, it won't be the fastest, but at least you can make it behave like filehandles.

Another option would be to make the readers separate processes, which print the virtual file contents through their STDOUT, which you can then pipe back into your program.

Re:Tied filehandles

schwern on 2005-10-16T17:24:21

Surely you can make it work using tied filehandles. It won't be dead easy, it won't be the fastest, but at least you can make it behave like filehandles.
BLALHRARHALHBLAHGH!

I just figured it out. See edited post.