18

I'm working on a system (Ethereum) where it is significantly cheaper to store 32 bytes than 33 bytes. I'd like to create a table where data is stored based on its hash.

Sha256 would meet this criteria since it outputs 32 bytes.

However, I'd also like to include a "version" byte in case I need to change the hash algorithm in the future. This would require 33 bytes.

One simple solution is to simply chop off the last byte and only use the first 31 bytes for the lookup.

  1. Does this bias the hash in any way?
  2. My assumption is this would decrease the computational power needed to reverse the hash by 1/256th. Is that correct?
  3. My assumption is this would increase the likelyhood of a hash collision by 25600%. Is that correct?
kelalaka
  • 49,797
  • 12
  • 123
  • 211
Akhil F
  • 285
  • 2
  • 5

1 Answers1

24
  1. Does this bias the hash in any way?

We want the avalanche criteria on the output bits, that is a change in the any of input bit must randomly affect half of the output bits. Each bit of the hash function must depend on the input bits; removing one bit doesn't affect the others.

  1. My assumption is this would decrease the computational power needed to reverse the hash by 1/256th. Is that correct?

First of all, hash functions are not really reversible since they are compression functions, that is, they map from a large input space to a shorter space.

$$ H:\{0,1\}^* \rightarrow \{0,1\}^l$$

If we want to talk about collision resistance, see the next answer. For generic pre-image search, yes; it will decrease the computational power, as you noted.

  1. My assumption is this would increase the likelihood of a hash collision by 25600%. Is that correct?

Collision resistance is measured by the generic birthday attack, that is, $\sqrt {2^l}$, $l$ being the output size of the hash function. SHA256 has $\sqrt {2^{256}} = 2^{128}$ generic birthday attack time.

In your case we will have $\sqrt {2^{256-8}} = 2^{124}$ as generic birthday attack time. Thus, we have a $2^{4}= 16$ speed-up in the attack time.

TL;DR: Truncating the hash to 31 bytes will be safe (see also this stack exchange answer


Note 1: Bitcoin miners reached $\approx 2^{92}$ SHA-256 hashes per year in 06 Agust 2019..

Note 2: SHA-224 defined in FIPS180-4 is calculated by truncating the SHA-256 hash value with using different initial constants for domain separation so the value is not the same as the first 28 bytes of the SHA-256 value.

kelalaka
  • 49,797
  • 12
  • 123
  • 211