Why can't I -B $scalar ?

jdavidb on 2004-05-07T14:19:52

I want to be able to apply the -B test to the contents of an arbitrary scalar to see if it's binary or not. I've got files that are occasionally spewing junk at me; the first N,000,000 records may be just fine, but toward the end they turn into gibberish. I want to print out erroneous records, unless they are binary garbage, in which case I just want to print a statement that says so.

Thinking of looking into IO::Scalar or something...


Uses the current buffer

merlyn on 2004-05-07T16:38:23

If I haven't been misinformed, -B uses the current STDIO buffer of the filehandle for its value, so you could seek near the end and possibly get a different result for -B as the file became "more binary".

Re:Uses the current buffer

jdavidb on 2004-05-07T17:50:57

For the record, I decided to just match each line against \0 as I read it, and that seems to work fine for now. Not quite as advanced a heuristic as -B, but good enough.

-B cheap plastic imitation

jhi on 2004-05-14T19:52:32

The heuristics seem to be pretty simple... if you ignore fancy bits like Unicode, locales, EBCDIC, MS-DOG line endings, and accept also the vertical tab as whitespace, the test for -B is pretty much

3 * tr/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/ > length

That is, printable and whitespace ASCII and ESC are okay,
others not, and if there are more than 1/3 not okays, call it binary.

Re:-B cheap plastic imitation

jdavidb on 2004-05-24T15:05:46

That seems like the kind of thing that would be nice to expose in a function, in much the same way uc, lc, and glob started their lives.

Re:-B cheap plastic imitation

jhi on 2004-05-25T14:08:39

> That seems like the kind of thing that would be nice to expose in a function, in much the same way uc, lc, and glob started their lives.

I dunno... I think the heuristic is so weak (false positives for arguably "text" data, for example), and by definition a binary test (ta-dah!), as opposed to multivalued, that I find little value of exposing that logic. I think e.g. adding my snippet to the FAQ should be quite enough for those who need it.