Wrapping pack()

jdavidb on 2003-04-01T17:31:34

In all of my days, I can only remember using pack() once in a production application. That's not because it's not useful; it's because I'm fortunate to deal almost exclusively in text in my work. I know I've used it in some tinier things but none come to mind.

Having little pack() experience, I find pack templates to be as incomprehensible as regexes to a VB programmer. Without online reference material I'd never have a chance.

So here's a thought: most uses of pack should be wrapped up in a small subroutine, shouldn't they? I mean, if you are unpacking a struct, make two routines (to convert either direction) that take binary data on one end and an array, hash, or object on the other. If you're performing some other conversion, say, converting epoch seconds to Julian dates with a brilliant use of pack that shaves .001 microseconds off of each iteration, make a subroutine called epoch2julian() that contains your pack call.

Sticking unexplained pack() calls right in the main code might be quick and easy, but think what you can gain with a well named subroutine with an obvious purpose that hides pack() away where you don't have to see it except when you want to. Think how you could write tests on just that subroutine. Think how the name makes the purpose clear. Think how the inexperienced programmer doesn't have to look up pack() if he knows he can trust that conversion routine. Think how you don't have to fool with it again if you know you can trust the routine.

Oh, I see to be gushing. I'm suddenly reminded of perlstyle:

Think about reusability. Why waste brainpower on a one-shot when you might want to do something like it again? Consider generalizing your code. Consider writing a module or object class. Consider making your code run cleanly with use strict and use warnings (or -w) in effect. Consider giving away your code. Consider changing your whole world view. Consider... oh, never mind.


Factoring

ziggy on 2003-04-01T19:42:36

So here's a thought: most uses of pack should be wrapped up in a small subroutine, shouldn't they? I mean, if you are unpacking a struct, make two routines (to convert either direction) that take binary data on one end and an array, hash, or object on the other.
Whoa, boy. You've got a good idea, but you're starting to overgeneralize.

What you're talking about is properly factoring your code. Your program is made up of a string of little operations performed in a meaningful order. One of those operations, conceptually is serialize_struct(@), while another is deserialize_struct($). (Or save/restore, or marshall/unmarshall, or read/write, or any other pair of similarly named operations.)

If this operation is performed many times in a program, then it is worthwhile to place this operation in a single descriptively named sub. It's somewhat irrelevant that the operation is pack/unpack; it's a black box. If there's a bug, then you want to fix it once.

On the other hand, I disagree that pack/unpack is a seldom used operation that should always be wrapped in a (descriptively named) sub. Down that path lies madness. The solution is to learn the language. If you cling to wrapping every piece of obscure syntax in a sub, then where does it all end? Do regexes and substitutions belong there? If you don't use splice very often, should it be wrapped as well?

Sticking unexplained pack() calls right in the main code might be quick and easy, but think what you can gain with a well named subroutine with an obvious purpose that hides pack() away where you don't have to see it except when you want to.
That's the Ostrich's solution. Don't hide from the language, learn how to use it.

There are times where a single pack/unpack call is useful as a single standalone operation. If you think the usage is obscure, you could wrap it in a descriptively named sub. A better solution would be to comment the usage in place to make it less obscure.

Think how you could write tests on just that subroutine. Think how the name makes the purpose clear. Think how the inexperienced programmer doesn't have to look up pack() if he knows he can trust that conversion routine. Think how you don't have to fool with it again if you know you can trust the routine.
On the other hand, think about high level vs. low level operations. A perl programmer should never need to test to make sure that chomp($a = "foo\n") should yield $a eq "foo". There are times when a single high level operation will be concisely expressed in a pack, and should be wrapped in a sub. There are other times when pack is but one part of a single conceptual operation that should be tested as a whole -- otherwise, you're just writing tests to prove that 1+1 is still equal to 2. :-)

Re:Factoring

chromatic on 2003-04-01T20:15:47

The solution is to learn the language.

That idea alone could save buckets of time writing workarounds and useless comments.

... otherwise, you're just writing tests to prove that 1+1 is still equal to 2.

Hey, I know a couple of people who've written tests just like that. Aren't you glad we aren't breaking chomp anymore?

Re:Factoring

jdavidb on 2003-04-01T20:29:59

That idea alone could save buckets of time writing workarounds and useless comments.

A tradition at my old workplace persisted for years and passed on to every intern and co-op that when you wrote:

open(FILE, $filename) || die "Cant open file: $!";

you should leave out the apostrophe in the word "Can't," because that once broke something. Obviously, what it broke was somebody who tried to use single quotes for the whole thing, but that part wasn't understood.

Re:Factoring

nicholas on 2003-04-02T13:59:02

Hey, I know a couple of people who've written tests just like that. Aren't you glad we aren't breaking chomp anymore?

Well, I added some tests to make sure that 1 + 1 == 2
t/op/arith.t .

Which reminds me - Schwern is not poorer yet :-(

Re:Factoring

jdavidb on 2003-04-01T20:27:14

You've got a good idea, but you're starting to overgeneralize.

Story of my life, I'll admit. :)

On the other hand, I disagree that pack/unpack is a seldom used operation that should always be wrapped in a (descriptively named) sub.

Oh, I don't believe it's a seldom used operation. It's seldom used by me, but I know it's in wide use across the world. And I fully agree with you that the solution is to learn the language. I'm not trying to argue that everyone should put pack in a sub to make me happy; I'm trying to say that while thinking of pack I came up with this idea that might help some people and wanted to float it around. [Hmm, you just reminded me of the boss who decreed we could never use if at the end of a statement because it confused him. He got agreement from everyone on the team but me, because they were all new but me.]

If you cling to wrapping every piece of obscure syntax in a sub, then where does it all end? Do regexes and substitutions belong there?

A very good point.

A better solution would be to comment the usage in place to make it less obscure.

A lot of the Oracle books I'm reading push me in the direction of making clearly named subprograms in lieu of comments. Since lost of developers don't like comments, it seems like it's worth considering, although I agree with you that something that is obscure (not something that's just an idiom I don't know, but something that is truly obscure) should be commented to explain it. Whether wrapped in a sub or not, actually.

There are other times when pack is but one part of a single conceptual operation that should be tested as a whole

That was actually kind of at the back of my mind, but I didn't mention it. I was thinking more of subroutines that "pack and do a couple of other things," although I was encouraging the thought of going ahead and sticking all packs in a subroutine call.

you're just writing tests to prove that 1+1 is still equal to 2

We will never have to worry about that until Perl 6, because backward compatibility is always guaranteed! ;)

Thanks for being a useful counterpoint to my thinking.

Re:Factoring

ziggy on 2003-04-01T20:41:47

A lot of the Oracle books I'm reading push me in the direction of making clearly named subprograms in lieu of comments. Since lost of developers don't like comments, it seems like it's worth considering, although I agree with you that something that is obscure (not something that's just an idiom I don't know, but something that is truly obscure) should be commented to explain it. Whether wrapped in a sub or not, actually.
Yes, this is the general idea behind factoring. The best texts I can recommend are Leo Brodie's FORTH books, "Starting FORTH" and "Thinking FORTH". FORTH is a language that demands proper factoring; without it, it would be impossible to get anything written or maintain a program for periods longer than a workweek. They're both out of print, so good luck finding them.

The key to factoring is learning when to factor. If you can turn a single pack into one logical and reusable operation, then it belongs in a sub. If it is just a piece of obscure syntax, throw some commentary around it.

Re:Factoring

dws on 2003-04-01T22:59:02

Don't hide from the language, learn how to use it.

I've used Perl for over 5 years now, and consider myself to be reasonably profficient. Yet I'll admit to puzzling over the docs whenever I try to develop anything beyond trivial pack/unpack formats. The docs could use some good examples.

Re:Factoring

nicholas on 2003-04-02T14:02:35

pack/unpack formats. The docs could use some good examples.

5.8.0 introduced a pack tutorial by Simon Cozens and Wolfgang Laun

Re:Factoring

jdavidb on 2003-04-02T14:52:43

5.8.0 introduced a lot of great new documentation.

Re:Factoring

dws on 2003-04-02T16:29:15

That new tutorial is much nicer than what's available in 5.6.1.

/me raises hand...

bart on 2003-04-03T23:00:10

So here's a thought: most uses of pack should be wrapped up in a small subroutine, shouldn't they? I mean, if you are unpacking a struct, make two routines (to convert either direction) that take binary data on one end and an array, hash, or object on the other.
Eh... I have to confess, I've written pretty much such a module... I haven't released it to the public, and likely I never will, because, well, you can read the responses from other people here.

I wrote it, when trying to convert the utility program macfont, which is originally written in C, to Perl. A huge part of that program is reading structures from the font file, interpreting the data, and following pointers. It would have been much harder to write, without the help from this little module.

So how does it work? You create a Pack::Struct object, while specifying the structure you're trying to handle. This specification gets used for a template for both pack() and unpack(). There's no real saving there, you still have to learn about the basic templates for pack. But: it also doubles as a list of keys for the hash, so when you unpack the structure into a record, it returns a hash filled with the unpacked data. The main advantage is that it reduces redundancy. As a drawback, it poses some restrictions on the kind of structures you can read. Most of all, the basic templates must act symmetrically in pack() and unpack(). Variable length data is out, fixed length records only. Included arrays and even nested structures are possible. There's even a callback mechanism to be able to provide strangely formatted fields, such as a 3 byte integer.

As a result, the perl code is at least as readable as the C source code — fare more so, if you ask me, but I'm biased. Let me give you a code example, a snippet from my script...

use Pack::Struct;
my $RsrcHdrStruct = new Pack::Struct(
  DataOffset => 'N',
  MapOffset => 'N',
  DataLen => 'N',
  MapLen => 'N',
  OSReserved => ['x96'],
  AppReserved => ['x128']
);

read INPUT, $_, $RsrcHdrStruct->length;
my $Mac_header = $RsrcHdrStruct->unpack($_);  # This returns a hash ref

seek INPUT, $Mac_header->{MapOffset}, 0;

my $RsrcMapStruct = new Pack::Struct (
  MapCopy => ['N' => 4 ],
  NextMap => 'N',
  FileRef => 'n',
  FileAttr => 'n',
  TypeOffset => 'n',
  NameOffset => 'n'
);
read INPUT, $_, $RsrcMapStruct->length;
my $Mac_map = $RsrcMapStruct->unpack($_);

seek INPUT, $Mac_header->{MapOffset} + $Mac_map->{TypeOffset}, 0;
read INPUT, $_, 2;
my $Mac_typeCount = 1 + unpack 'n', $_;
# etc...
You can compare this to the C structs of the same name in mactypes.h (lines 31-47); to the code to read the structs from the file, in macio.c containing the definitions of the functions read_mac_header() and read_mac_map() (lines 80-114); and the snippet that knits it all together by calling these two functions, in macfont.c (lines 721-732).

Hmm... Now that I look at it again, despite everything, perhaps it might be an interesting module to release, anyway?

Re:/me raises hand...

jdavidb on 2003-04-04T14:01:59

Eh... I have to confess, I've written pretty much such a module... I haven't released it to the public, and likely I never will, because, well, you can read the responses from other people here.

Why not????? Don't let the opinions of people sway your view of how useful the module is, nor let you pass up a chance to submit it to others to see how useful they think it is. We operate on the economy of ideas. Just because your idea doesn't work for some people doesn't mean everyone will dislike it. That's like saying everyone should use strict all the time. What would you suffer if you released the module? It's not like you're asking to put it into core.

And I have to say, what you describe here is beyond what I was thinking of in this journal entry. You are talking about something much more general, which is good. (I had such an idea, once, but you will notice I produced no code from it.)

Now that I look at it again, despite everything, perhaps it might be an interesting module to release, anyway?

Emphatically yes. I am certain many people will find it useful.

Re:/me raises hand...

jplindstrom on 2003-04-04T17:04:26

I think it sounds like a good idea. It also sounds a lot like the newly released Win32::API where I think you can do something similar when calling funcions with C structs.