You suck at risk management

schwern on 2008-10-04T17:15:14

After yesterday's post on statistic impossibilities and checksum algorithms, I *still* got people telling me that if it's not impossible it might happen. That it's fundamentally wrong and unreliable.

Look, humans suck at risk management. But this is not a case of software engineers picking something "good enough". This is not some guy at work, or the git folks, slapping together a hashing algorithm, doing some back of the envelope calculations and declaring it "impossible".

Mathematicians do not suck at risk management.

This is thousands of cryptographers all over the world pouring over SHA-1 trying to break it all the time. This is the NSA doing exhaustive and terrifying analysis because it's a US crypto standard. Finding a crack in SHA-1 not only makes you famous, if you're nefarious it can make you rich.

Cryptography runs on the idea that there are things which are so improbable they're impossible. So improbable that a thousand other world-shattering things would happen first. Really Smart Guys who know far more math and statistics than either of us put together and cubed figure this stuff out.

If you still don't understand, read this book!

If you are concerned about the commit hashes in git you are sucking at risk management.

Your hard drive is designed with a probability of failure. There is a 1% chance in the first year, and increasing after that, that it will fail. Every device, electronic or mechanical, has a probability of failure designed into it. Even something as simple as a hammer is designed to fail. That's ok, engineers work in margins of safety. A factor of 4 is good. A factor of 10 is great. Good crypto builds in a factor of about 10^27.

People worry about the spectacular, but improbable, failure and forget about the common place, but just as lethal, failures. They have gotten used to living with the risk. Just like one worries about commit hash failure, but is perfectly happy to use a hard drive they know will fail. Even if you shoved them all into a hardware RAID there's still a chance of them all failing before you can swap them out.

To make a sadly all too real analogy, this is like people who want to spend tons of money making every skyscraper plane-proof and fail to notice that many of them aren't even fire-proof. Even worse, by focusing on the spectacular, improbable failure they are drawing resources away from defending against the far more likely failure. That is the true danger of bad risk management, drawing attention away from the real risks.

A system is as risky as its MOST likely risks. Focusing on the least likely risks just wastes time and solves nothing.

Good risk management starts at the MOST likely and MOST dangerous failure, deals with it, and moves on to the next most one. Commit hash collision in git is somewhere down along with "alien attack" and "spontaneous implosion". You know what's the most likely failure? A simple bug in the code.

Don't worry about someone flying a plane into git, make it fire proof instead.


odds of failure * badness of failure

amoore on 2008-10-05T14:03:02

Schwern - This point from your previous post also helps to deflect some of that criticism:

"For those who still don't get it, let's say there is a collision. What happens? We might lose a single commit. And then we report it."

Even in the astronomically rare event of a collision, the worst that happens isn't that bad. If we were gambling the future of the world on this, then it might make sense to consider continuing to focus on improving our odds. But, as it is, we have adequately protected against this risk.