SHA-256 doesn't follow a uniform distribution?

Question

I have been playing with SHA-2-256 in Julia and I noticed that the hashes produced don't appear to follow a uniform distribution. My understanding of secure hashing algorithms is that they should approximate a uniform distribution well, so they are not predictable.

Here is the Julia code I'm using:

using BitIntegers, Distributions, HypothesisTests, Random, SHA
function sha256_rounds()
    rounds::Array{Array{UInt8,1}} = Array{Array{UInt8,1}}(undef, 10000) # 10000 Samples
    hash::Array{UInt8} = Array{UInt8}(undef, 64) # 64-byte array
for i = 1:10000
    hash = sha2_256(string(rand(UInt64), base = 16)) # Random number, convert to hex string, then seed
    rounds[i] = hash
end

return rounds

end
sha256_str_vals = [join([string(x, base = 16) for x in y]) for y in sha256_rounds()] # Stitch the bytes together into strings
sha256_num_vals_control = [parse(UInt256, x, base = 16) for x in sha256_str_vals] # Get the numerical value from the strings
OneSampleADTest(sha256_num_vals, Uniform()) # One sample Anderson-Darling test

And the result of the test:

One sample Anderson-Darling test
--------------------------------
Population details:
    parameter of interest:   not implemented yet
    value under h_0:         NaN
    point estimate:          NaN
Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-7
Details:
    number of observations:   10000
    sample mean:              8.73991847621225e75
    sample SD:                2.2742656031884893e76
    A² statistic:             Inf

To me this says that the produced hashes do not conform to a uniform distribution. Am I using the test incorrectly, or is my sample faulty? Thank you for your thoughts.

fgrieu · Answer 1 · 2021-11-28T19:50:03.993

Again, we are not a code review site, especially for code in a language seldom used for cryptography. And there are obvious issues with the code:

sha256_num_vals_control is computed but not used, when presumably the intend was that it is.
I can see neither an attempt to normalize the generated material to interval $[0,1)$, nor an input to OneSampleADTest specifying a range.

I conclude the samples for OneSampleADTest are not formatted as expected for this test. Malformed in, garbage out.

Even if the samples were correctly formatted, cryptography would not care for bugs in OneSampleADTest in a certain version of Julia and the library used. It would care for a valid claim that SHA-256 output for distinct inputs prepared independently of the constants in SHA-256 can be distinguished from random. But such extraordinary claim would need extraordinary evidence. And as a preliminary, a description independent of the language and it's libraries.

SHA-256 doesn't follow a uniform distribution?

1 Answers1