Can one use a reversible hash algorithm as a compression function?

Question

As we know any input to SHA-256 will be returned as 64 hex length output. Is it possible to create a hash that can do the same thing as SHA-256 but can be reversed, so if we have the output of 64 length of hex number, we can reverse it and get started input like "i love programming"? This will be such a cool way to compress huge text. Is it possible?

forest · Answer 1 · 2022-04-11T20:36:04.033

Strictly speaking, all hash functions are compressing since the output can be smaller than the input, but I imagine you're asking about compressing data that can later be losslessly decompressed.

This is impossible due to the pigeonhole principle. The fact that the fixed output space of a hash algorithm is smaller than the input space means that there will always be multiple possible inputs for the same output. Even if you could find a preimage (i.e. "reverse" the hash), you would not know if it is the original preimage or just a preimage. If the input size was sufficiently small (smaller than the hash) then it may be possible since you can discard larger preimages and only keep the very finite number of small ones that make sense (e.g. only valid ASCII strings), but then it's not compression.

As an extreme example, imagine a "hash function" with a trivial preimage attack: a 1-bit CRC with the polynomial x + 1 (i.e. an even parity bit). If I give you the output of this function and the output is 1, you will have absolutely no idea what the input was. You can compute an input, but you can't find the input. For a 1-bit hash, half of all possible inputs in the input space map to that same output!

This impossibility is the basis of the popular joke from Schneier Facts:

For Bruce Schneier, SHA-1 is merely a compression algorithm.

kelalaka · Answer 2 · 2021-02-21T22:15:39.103

If the question was about (current form)

Reversible cryptographic hash functions

Then No!

One-wayness property of the cryptographic secure hash functions will prevent that. Hash functions don't use keys. So if you can reverse, everybody will reverse and there will be no secure hash function at all.

Besides, mathematically impossible, too; hash functions use arbitrary large inputs to digest a fixed size $\ell$

$$H:\{0,1\}^* \to \{0,1\}^\ell$$

Reversibility requires being 1-1 and onto, you cannot reverse a function if it is not 1-1, and clearly, cryptographic hash functions are not 1-1 since the fixed output size. 1-1 is a bad property for cryptographic hash functions and what is described is a permutation.

This can be seen clearly with Pigeonhole principle; you have a small number of holes for arbitrarily numbered pigeons. Therefore at least one hole will contain more than one pigeon. When you try to map back, which pigeon will you choose? fail!

Also, the Cryptographic Hash function needs to mangle the inputs to shrink the output and that causes to loss of information by the and operation ($\wedge$) that is not reversible.

Therefore what you need is just the opposite of what we want from cryptographically secure hash functions. The pre-image will fail!
If the question was about (The first revision)

compressible encryption;

Then No!

Encryption schemes unlike hash functions are reversible operations. Therefore the output space must be at least the same as the input space.

If you want to compress do it before encryption. Then you can decompress after decryption.

$$c =E_k(compress(m)) \quad \text{ and } m =decompress(D_k(c))$$

However, note that, the compression before encryption can be problematic as in the CRIME (Compression Ratio Info-leak Made Easy);

CRIME (Compression Ratio Info-leak Made Easy) is a security exploit against secret web cookies over connections using the HTTPS and SPDY protocols that also use data compression. When used to recover the content of secret authentication cookies, it allows an attacker to perform session hijacking on an authenticated web session, allowing the launching of further attacks. CRIME was assigned CVE-2012-4929.

The original paper - 2002 - John Kelsey - Compression and Information Leakage of Plaintext

and the follow-up BREACH

BREACH (a backronym: Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) is a security exploit against HTTPS when using HTTP compression. BREACH is built based on the CRIME security exploit. BREACH was announced at the August 2013 Black Hat conference by security researchers Angelo Prado, Neal Harris and Yoel Gluck

Another problem is achieving side-channel free compression as noted by Hola. If there is a side-channel attack possibility one might consider this, too.

Therefore, if you want to use compression before encryption execute a serious analysis of your decision.

Ironically, the modern encryption schemes do the reverse (increase the size - a little) due to security. Block/stream ciphers need IV/nonce to achieve Ind-CPA security. The modern mode of encryption methods like AES-GCM and ChaCha20-Poly1305 produces an authentication tag that increases the output size, too.

Anonymous · Answer 3 · 2021-02-21T06:52:35.367

Yes, it is possible provided you allow for hashes that are not necessarily a fixed length. This is essentially what content based addressing is. The content can be "any text, file, or byte" and it is addressed by a hash of it. For an example of this you can check out IPFS. You can share a short address of a file with someone else and they can decompress that hash back into the original piece of content. This works by maintaining a mapping between these addresses and the content.

While forest says that this is impossible due to the pigeonhole principle when using a fixed size hash output, IPFS gets around this by being able to support an arbitrary number of hash functions which can each have an arbitrary length. By varying the hash function or length you will always be able to find an address that is not currently used.

Can one use a reversible hash algorithm as a compression function?

3 Answers3