3

Informally, the Avalanche effect says that two similar, but not identical, inputs should produce radically different outputs when fed through a hash function.

I've seen, at least, two formal definitions of the effect. The first is that if we flip one bit of an input, the two respective outcomes should typically be very different and it's guaranteed that they're not the same. The other definition is that if we flip one bit of an input, the outcomes are statistically independent. I find this confusing: the two definitions are contradictory.

What is the proper definition? And practically, does modern hash functions like sha256 have this property?

To set ideas. Let $H(\cdot)$ be some hash function with $N$ possible outputs. Let $X_1$ be an (integer) random variable, uniformly distributed on some interval. (Assume that $\Pr(H(X_1) = x) = N^{-1}$ for each $x$.) Let $X_2$ be a RV such that it differs exactly in one bit with respect to the first RV (i.e., the Hamming distance between $X_1$ and $X_2$ is exactly one). What does the Avalanche effect says about $\Pr(H(X_2) = H(X_1)\,|\, H(X_1))$?

The first definition says that $\Pr(H(X_2) = H(X_1)\,|\, H(X_1)) = 0$. The second says that $\Pr(H(X_2) = H(X_1)\,|\, H(X_1)) = N^{-1}$. Obviously, both cannot be true.

Fredrik Savje
  • 151
  • 1
  • 6

1 Answers1

5

The proper definition of avalanche effect is defined in the paper of Webster, A. F. "On the design of S-boxes". Advances in Cryptology - Crypto '85 as :

For a given transformation to exhibit the avalanche effect, an average of one half of the output bits should change whenever a single input bit is complemented.

It is also seen as each bit should have 50% chances to change if you change 1 bit of the input.

This basically mean that in order to test an avalanche effect, you have to apply an algorithm similar to the following.

# result array
array res[n] = {0, ..., 0}

# compute all inputs
for each inputs x of size n
   ref_val = H ( x )

   # compute all possible 1 bit change
   for i in 0..n-1
      test_val = H ( X ^ (1 << i) )

      # compute statistics per bit
      for j in 0 .. n-1
         res[j] += ((test_val ^ ref_val) >> j) & 1


# in the end forall k, res[k] / (n * 2**n) should be around 50%.

Due to the size of the inputs, you can test with a relative precision whether or not $\operatorname{SHA256}$ does have a strict avalanche effect. But by doing a sampling over a large number of inputs where you flip each bit once, you can have a measure.But in most cases we usually analyze the components of the primitive in order to assert whether or not the avalanche effect is respected.

Biv
  • 10,088
  • 2
  • 42
  • 68