Heatmap Ranges

Ovid on 2008-09-03T14:49:04

I'd like to take a series of values and convert them to numbers between 0 and 255, but on a logarithmic scale. My math studies were a long time ago and for the life of me I can't remember how to do this. I'm hoping to take a 2D table of values and project them onto an HTML table in a 'heatmap' fashion.


scale linear then log

slanning on 2008-09-03T15:30:49

Maybe I'm misunderstanding, but I think you want to first scale logarithmically: log($val) if it's log base 'e'. Then linearly: so find the max value in the logged sequence, then multiply each value by 255 and divide by the max. Here's a kind of generic script, if I got it right:

#!/usr/bin/perl
# convert @VALS to a log scale base $LOG_BASE
# and scaled linearly to $SCALE_MAX

use strict;
use warnings;
use List::Util qw(max);

my $SCALE_MAX = 1000;
my $LOG_BASE = 10;
my @VALS = (1, 10, 100, 1000);

main();

sub main {
    my @logvals = map { logN($_, $LOG_BASE) } @VALS;
    my $maxval = max @logvals;

    foreach my $val (@logvals) {
        print $val * $SCALE_MAX / $maxval, $/;
    }
}

sub logN {
    my ($val, $N) = @_;
    return ($N eq 'e') ? log($val) : log($val) / log($N);
}

Re:scale linear then log

veryrusty on 2008-09-06T13:04:47

Need to scale linearly first in the case the minimum value is less than 1 before taking logs. (Also, $SCALE_MAX should be 255)

Instead, if you simplify the mathematics (and you know the minimum and maximum values), calculate the logarithmic 'scaling factor'

$scale = log( $maximum ) / 255;

Then apply logarithmic scaling to each data element

map { int( log( $_ - $minimum + 1 ) / $scale ) } @VALS;

So that the minimum value scales to zero, maximum value scales to 255. Adjust the linear scale in the log calculation if you require some value smaller than the minimum to scale to zero. Just remember that log(1)=0.

Log2 for heatmap, fun with pack

n1vux on 2008-09-11T00:02:20

i think both of the prior comments are on target.

If you have such quantities of data that scaling and calling log repeatedly is a problem - and only if - there are old integer bitbang routines for log2 that could be done with XS or Inline::C or PDL.

Or you could (ab)use Perl 5.10 pack() to grab the floating point representation's exponent

Note that 256 buckets is a lot, hi res, for loglinear data unless it already was floating point or Math::BigFloat - as log2(MAXLONG) - log2(1) 256 or 8 bits -- it's the aize of plain float's exponent. BUT grabbing it is hard since it's offset one bit by the sign.

If your data is not log clean but may include the gamut from -INF to 0 to +INF, if you grab the sign and the top 7 bits of exponent, that's nice, but it's not a single numeric range -- exponent is UCHAR biased by 127 [or by 63 after we nip one bit taking a byte with the sign] -- while the sign prefixed to it inverts direction.) A crass fix -

my $s;
my $n= unpack("C",pack("f>",$data)); #> on x86 cpu
$n ^= 0xff if $s=$n & 0x80;
$n ^= 0x80 if !$s;

if its all positive, and you want full 256 buckets without XS or PDL or log(), the best I can see is unpack with B9, discard the leading sign bit, repack B8, unpack C. But if you're going to do that might as well get full dynamic range from 11 bit 'double' exponent.

sub log2{
  require 5.010; # assumes x86 too...
  # should die if arg <= 0 ...
  # instead gives log2(abs())
  my $str =unpack("B12",pack("F>", shift));
     $str =~ s/^[01]/00000/;  # drop sign and pad
  my $exp =  unpack("s>",pack("B*", $str ));
  return -1023 + $exp;
}

With a bit of magic number abuse, we can easily squeeze out a single fractional bit as well, which could give upto 4096 buckets for positive numbers, and as many more for negatives if you rescale somewhere.

Why 5.10? Because on any commodity platform (x86) we need to coerce to a sensible bit/byte order. I am sure I could work out a x86 specific way with older pack but life is too short.