Testing and randomness

acme on 2004-06-18T08:17:22

Tip for the day: if your test suite uses rand(), then please run your test suite a lot of times before releasing to CPAN. I've been guilty of it in the past, and so have others. Another friendly tip from your friendly orange Perl hacker...

I've had this problem too ...

drhyde on 2004-06-18T10:31:14

... with Net::Random. In this case, I don't use rand(), but I need to test a module whose output is meant to be truly random. Ugh.

Re:I've had this problem too ...

jmason on 2004-06-18T22:45:39
what do you do? map the distribution? ugh, that's a tricky one ;)

Re:I've had this problem too ...

drhyde on 2004-06-18T22:53:55
I cheat. The current tests SUCK.
What I really need to test is that I'm not introducing any bias to the data between me getting truly random data from the source and me presenting it to the programmer. My code manipulates the random data to do things like only produce values within a particular range. I have convinced myself that I'm probably OK, but I *need* regression tests for this.

Example: Tie::Hash::Cannaboli

schwern on 2004-06-29T15:41:37
Are you testing the functionality of the program or are you testing its randomness?

If its the latter, say you're developing Crypt::Random or checking that /dev/urandom works, Statistics::ChiSquare sounds like the way to go.

The former is more common. How do I test non-deterministic behavior?

Seeing as how Dave Cross bought my testing services for Tie::Hash::Cannabinol at a YAPC auction way back when, let's use that as an example.

Tie::Hash::Cannabinol is a completely useless demonstration of how to
use Tie::StdHash to pervert the behaviour of Perl hashes. Once a hash
has been "tie"d to Tie::Hash::Cannabinol, there is a 25% chance that it
will forget anything that you tell it immediately and a further 25%
chance that it won't be able to retrieve any information you ask it
for. Any information that it does return will be pulled at random from
its keys.

Oh, and the return value from "exists" isn't to be trusted either :)

We have to test:

1) Set doesn't work 25% of the time
2) Retrieval loses values 25% of the time
3) Retrieval can return any value in the hash

First thing to do: what can we test that's not random? Alternatively, can we cancel out the randomness by testing the data against itself?

Do keys(), values() and each() all return the same number of elements?

my $each_cnt;
$each_cnt++ while each %stoned_hash;
is $each_cnt, keys %stoned_hash;
is $each_cnt, values %stoned_hash;

Is keys() static? Once its set it shouldn't change.

my @keys = keys %hash;
for (1..20) {
is_deeply [sort keys %hash], [sort @keys];
}

Now onto the real tests. This gets statistical. I failed my one statistics class, so take everything here with a grai of salt.

Here's the sorts of questions you're trying to ask when writing tests for random data.

1) Is the code using the right probabilities?
2) Are the tests avoiding the degenerate case? (ie. a string of tails when flipping a coin)
3) Are we fully exercising the set of all possible results?

To avoid the low probability degenerate scenarios we can use a large number of iterations relative to the overall probability. Since the probabilities here are pretty high the number of iterations don't have to be that large.

#1 first involves what sort of epsilon you're willing to accept, since the testing here is statistical. Since this application isn't terribly critical we'll say 10%. The number of iterations you have to run rises rapidly as your epsilon gets smaller.

Then it largely comes down to creating a set of random data and seeing if it matches your expectations.

The comments in Tie::Hash::Cannaboli sez:

# Revision 1.3 2001/09/05 19:48:15 dave
# fixed a very serious bug where instead of returning a random value
# from the hash we were, in fact, almost always returning C.

Gotta test that bug!

# Ensure we're forgetting 25% of the time.
{
my @values;
my $iters = 1000;
push(@values, grep(defined, values %hash)) for 1..$iters;

# $EP is .1
cmp_ok( abs( @values - ($iters * keys(%hash) * 3/4) ) / $iters, '=', $EP);
}

I don't really know the probabilities of failure for this test and would be interested if someone could show how to work it out.

To handle #2, the odds of a degenerate case decreases rapidly with the number of iterations. With 25% 20 iterations gives us odds of 9e-13. ie. Pretty bloody unlikely. So 20 iterations should be enough to avoid the degenerate case.

If we forget something, do we remember it again?

TEST: {
# Forget something...
1 while defined $hash{3};

# Try to remember it again.
for (1..20) {
if( defined $hash{3} ) {
pass;
last TEST;
}
}
fail;
}

The infinite loop cannot be proven to return but its very likely it will after just a few iterations. And the odds of failing to remember the value in 20 iterations is, as mentioned, 9e-13.

For #3 we want to make sure we're exercising the entire set. Again its down to running the test enough times. The smaller the set the less iterations you have to do, but I don't know the math for it.

Do we ever get a value outside the range we put in?

for (1..20) {
no warnings 'uninitialized';
like join('', values %hash), qr/^[1..4]*$/;
}

Generate a big set of values() results, check that nothing is outside of the range. The number of iterations should be adequate to ensure that you've hit every value.

That's the major scenarios I can think of.

The final icing on the cake is repeatability. When a user reports a bug you want to be able to repeat it. To do that you need to know what the random number generator was seeded with.

BEGIN {
# Store the seed so we can repeat the test if we fail
$Seed = $ENV{TIE_HASH_CANNABINOL_SEED} || rand;
srand($Seed);
}

END {
print STDOUT "# Seed was: $Seed\n";
}

A user sends you the seed along with the test output and you should be able to repeat the experience.

Re:I've had this problem too ...

wnodom on 2004-06-23T21:28:57

I need to test a module whose output is meant to be truly random.

I've used Statistics::ChiSquare for this in the past, and it worked quite well. It was originally written by the very-large-brained Jon Orwant, and is currently maintained by David Cantrell.
--Bill