A tale of two orders

brian_d_foy on 2004-02-05T22:09:45

I have been developing Mac::iTunes on a Mac---no surprise there---but I meant for parts of it to work anywhere. Perl is mostly architecture independent, but I got bit with one of the bits it cannot help.

As part of the parser that reads the binary iTunes database file, I read two or four bytes and then unpack them into shorts or longs, which, in Perl, are just numbers.

Now, on a PowerPC, the big bits show up on the left just like our thousands place show up on the left of the hundreds place. This is not true on all processors though. Some mix up the bytes so the bits end up all over the place. Ick!

On the Mac, my parsing stuff worked because I do not have to worry about byte ordering, and the binary file has the bytes in the right order. When I tested the module on another unix account yesterday, everything exploded, sending goofy characters all over the screen and messing up my terminal (a windows client that is far inferior to Terminal). Somewhere the parser read something wrong, got confused, and starting slurping bytes it should not have slurped. Things that were not strings became strings with wierd characters.

At first I suspected a unicode boo-boo, since I had started to muck around with the unicode strings that iTunes uses, but "use bytes" did not do anything to help.

Now, since this explosion messed up my terminal, I had a hard time getting output I could read. I tried to redirect stderr to stdout and then stdout to a file (in BASH), but that did not work. I found a little BASH trick that did, though, and I still do not know why it worked and the other did not.

# does not work
% perl script 2>&1 > test.out

#works
% perl script &> test.out

I took the output back to my computer and stared at it for a bit. Really, just looked at it while I ate some Spaghettios I heated in my canteen cup. I wondered how many bytes the first string was, for some reason. I counted 1280. That number does not look special to me, but I knew is was supposed to be 5.

I thought for a moment. I wondered "What is 5 if the byte order is reversed?", figuring I would get some other strange number. If I have the string "\000\005", and I turn it into a short with unpack, I get 5 on my PowerPC. If I turn it into a short on an Intel, I get...yep, I get 1280.

Huzzah.

I now have to deal with this in my code. I know unpack can figure this out because there is network order and VAX order formats, although I always get them mixed up. I need to turn this unpack into something that give the same answer on both architectures.

unpack( "S", $data );

With a two options, either the network or VAX order, I get it on the second try.

unpack( "n", $data );

Thus, with this fix, *BSD folks (and, if they play nicely, maybe the Linux folks) can parse the iTunes database format too. They will even find a spiffy new Makefile.PL that does not bother them with all the Mac specific bits.

Bash stuph

Ovid on 2004-02-05T22:32:11

# does not work % perl script 2>&1 > test.out

I'm not entirely certain why, but when you do that, only what would originally have been sent to STDOUT is redirected to test.out and your warnings will go to STDOUT. Change that to:

%perl script > test.out 2>&1

#works % perl script &> test.out

Never seen that before, but then, I still don't understand Linux terribly well.

Re:Bash stuph

vsergu on 2004-02-05T22:35:59

From man bash:

Note that the order of redirections is significant. For example, the command

ls > dirlist 2>&1

directs both standard output and standard error to the file dirlist, while the command

ls 2>&1 > dirlist

directs only the standard output to file dirlist, because the standard error was duplicated as standard output before the standard output was redirected to dirlist.

Re:Bash stuph

Ovid on 2004-02-05T22:43:26

man bash: directions for using the feminist shell :)

Re:Bash stuph

iburrell on 2004-02-06T18:49:19
It is easy to figure out if you remeber that bash does the redirections in the order specified. And that it uses dup to do the redirections, making a copy of the current file descriptor. You will get the same behavior from Perl code. open(STDERR, ">&STDOUT"); open(STDOUT, ">dirlist"); open(STDOUT, ">dirlist"); open(STDERR, ">&STDOUT");
BTW, bash has a shorthand for redirectiny stdin and stderr together. ls &> file

Re:Bash stuph

brian_d_foy on 2004-02-06T20:19:07
It only seems easy to remember, but in my mind it does not work properly. If I redirect something to stdout, then redirect stdout, in my mind anything in stdout should go to the new place. Alas, that is not the case.

ah-ha!

LTjake on 2004-02-06T14:33:07

Your post got me thinking...

In my File::SAUCE module I have a pack template. A test on a solaris machine was giving me messages like this:

t/20-read.........# Failed test (t/20-read.t at line 65) # got: '256' # expected: '1'

I was using 'S' in my template. So, on my win32 box, i plug in 'n' just to see what it would do. It was giving me the same errors as above.

So, as per perl-port, i now do an endian-ness check and use 'n' or 'S' when needed.

I think I'm closer to having my tests pass! Thanks!

Big endian, little endian

iburrell on 2004-02-06T19:02:23

I thought it was pretty interesting how you puzzled out that byte orders were different on different platforms and how to work around it. I thought all programmers knew about little endian and big endian byte orders.

Then I realized that Perl does an excellent job of hiding byte order. With Perl, it is much less common to read binary structures than in C. Since pack does a good job of handling the differences, byte order just becomes part of the specification of the format.

Basically, there are two different orders for the bytes of integers in binary form. Big endian puts the most significant byte first. Little endian puts the least significant byte first. x86 is little endian, 68000 is big endian, PowerPC can do both but Macs are big endian for compatibility. Network order is big endian.

Wikipedia has an excellent article on Endianness.

Re:Big endian, little endian

brian_d_foy on 2004-02-06T20:12:29
I do know about byte orders. I just have not had to deal with it for a long time, so I was not thinking about it. All I knew when I started was that the function was reading too many bytes, and, as usual, I started by looking at changes to the code I had made recently.

My first battle with endianness was moving a couple of gigabytes of data from an intel machine to a motorola based one. To make the analysis of this data easier, I needed to flip around the byte order of the longs. C was taking too long, so I learned 68k assembly, which has some nice operations to do just that, and dropped the processing time down from hours to a couple of minutes.

Re:Big endian, little endian

pudge on 2004-02-11T03:28:16
I've had a little more experience than most people with byte order and Perl, primarily because of MacPerl work. But it also came into play with stuff like Storable and DB_File output when going from SPARCs or PPCs to Intels, or vice versa. Hurray for Storable's nstore()!