179

First off, I know hashes are 1 way. There are an infinite number of inputs that can result in the same hash output. Why can't we take a hash and convert it to an equivalent string that can be hashed back to the original hash output?

eg:

string: "Hello World"
hashed: a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e 

unhash: "rtjwwm689phrw96kvo48rm64unc8oetb5kmrjiuh7h8huhi6dde5n5"
        (a real string that gives the same hash as "Hello World")
hashed: a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e 

...

kodlu
  • 25,146
  • 2
  • 30
  • 63
Hello World
  • 1,907
  • 2
  • 11
  • 3

7 Answers7

272

Take a simple mathematical operation like addition. Addition takes 2 inputs and produces 1 output (the sum of the two inputs). If you know the 2 inputs, the output is easy to calculate - and there's only one answer.

321 + 607 = 928

But if you only know the output, how do you know what the two inputs are?

928 = 119 + 809
928 = 680 + 248
928 = 1 + 927
...

Now you might think that it doesn't matter - if the two inputs sum to the correct value, then they must be correct. But no.

What happens in a real hash function is that hundreds of one-way operations take place sequentially and the results from earlier operations are used in later operations. So when you try to reverse it (and guess the two inputs in a later stage), the only way to tell if the numbers you are guessing are correct is to work all the way back through the hash algorithm.

If you start guessing numbers (in the later stages) wrong, you'll end up with an inconsistency in the earlier stages (like 2 + 2 = 53). And you can't solve it by trial and error, because there are simply too many combinations to guess (more than atoms in the known universe, etc)

In summary, hashing algorithms are specifically designed to perform lots of one-way operations in order to end up with a result that cannot be calculated backwards.

Update

Since this question seems to have attracted some attention, I thought I'd list a few more of the features hashing algorithms use and how they help to make it non-reversible. (As above, these are basic explanations and if you really want to understand, Wikipedia is your friend).

  • Bit dependency: A hash algorithm is designed to ensure that each bit of the output is dependent upon every bit in the input. This prevents anyone from splitting the algorithm up and trying to reverse calculate an input from each bit of the output hash separately. In order to solve just one output bit, you have to know the entire input. In other words, when reversing a hash, it's all or nothing.

  • Avalanching: Related to bit dependency, a change in a single bit in the input (from 0 to 1 or vice-versa) is designed to result in a huge change in the internal state of the algorithm and of the final hash value. Since the output changes so dramatically with each input bit change, this stops people from building up relationships between inputs and outputs (or parts thereof).

  • Non-linearity: Hashing algorithms always contain non-linear operations - this prevents people from using linear algebra techniques to "solve" the input from a given output. Note the addition example I use above is a linear operation; building a hash algorithm using just addition operators is a really bad idea! In reality, hashing algorithms use many combinations of linear and non-linear operations.

All of this adds up to a situation where the easiest way of finding a matching hash is just to guess a different input, hash it and see if it matches.

Lastly, if you really want to know how hard reversing a hash is, there's no better substitute than just trying it out for yourself. All good hashing algorithms are openly published and you can find plenty of code samples. Take one and try to code a version that reverses each step; you'll quickly discover why it's so hard.

adelphus
  • 2,324
  • 1
  • 9
  • 8
42

Cryptographically secure hashes were specifically build to (among other things) make what you're asking hard!

Now, you could try to create an appropriate dictionary of all hashes, hoping to find appropriate pairs... but it would take more storage space than the total storage space that's currently available on our planet and more computing power than you'll be able to get access to in this universe (at least, at the time of writing this) — which is why we call it "infeasable".

In your theoretical example, the collision would be the strings "Hello World" and "rtjwwm689phrw96kvo48rm64..." both producing the same hash a591a6d40bf420404a011733...

For SHA-2 and SHA-3, such pairs are not known up until today. If, such a (once cryptographically secure) hash would have to be considered as broken due to collisions.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
26

Strictly speaking, you can, and it stands to reason that you can.

A SHA-1 hash has $2^{160}$ possible values. If we just consider $100$ byte binary plaintexts, well, there are $2^{800}$ possible ones of those. So it stands to reason that for any SHA-1 hash, there are likely to be around $2^{640}$ $100$ byte binary plaintexts that would match it.*

When two inputs have the same hash, it's called a hash collision. For non-secure hashes, it's not particularly difficult to find a collision. It's not even a design goal. For example, Java classes often have a hashCode() method, which generates a hash used to facilitate data structures like a HashMap. But these use algorithms designed to be cheap to run, which produce few accidental collisions. If you want to deliberately craft two objects with the same hashcode, it's usually easy.

Cryptographic hashes are designed, not to make collisions impossible, but to make them extremely difficult to find. That is, if your goal is to find an input that generates a given hash, there should be no way to do it that's faster than brute force -- trying every input in turn until one works.

The maths behind this are well documented -- find a book if you want to; this is not the place to explain it (nor would I be able to).

... and not every collision is useful. Consider a signed email message. There might be a lot of chunks of data that yield the same hash. But only a tiny subset of those look like text. And only a tiny subset of those look like English text. And probably only one of those looks like English text that the purported sender could plausibly have written.

So, you can find collisions using brute force, but brute force necessarily takes a long time, and that's what gives you security. The best cryptographic security is designed such that brute forcing would take longer than the age of the planet (possibly universe!), on our fastest computers.

For example, since there are $2^{256}$ possible SHA256 hashes, you would have a $1/2$ probability of finding one collision if you tried $2^{255}$ different inputs. At a microsecond per attempt, this would take in the order of $10^{63}$ years.

(There are various ways you can improve these odds, for example you could double your chances by searching for two target hashes at the same time -- but the numbers are still huge, and as computer power increases we just move on to longer hashes).

A cryptographic hash is considered to be theoretically broken if anyone finds a way to find a hash collision that's more efficient than brute force.

But even weaker algorithms provide security -- if someone gets hold of my password hash, then spends 5 years finding a collision, well, never mind, I have changed my password by then.


* - I chose SHA-1 for this first example because the is shorter than more current algorithms and we get some easy-to-understand numbers out of it. Note though, that the shortness of the hashcode isn't the only thing wrong with SHA-1; it has flaws such that brute-force isn't the only way to find collisions.

slim
  • 361
  • 2
  • 5
21

I'm taking a guess at where your confusion stems from.

The one-way-ness of hash functions does not relate to the mathematical property of being a not injective function.

A function $f$ that is injective will have different values $f(x), f(y)$ for all $x \neq y$. And indeed hash functions are usually non-injective (this can easily be derived from the fact that their domain is bigger than their codomain). But that is not the meaning of one-way.

Instead saying that a hash function is one-way specifically precludes the thing you want to do, which is to find a value $x$ such that $H(x) = y$ if you already have $y$. In other words given $x$ you can calculate $H(x)$ but going backwards is impossible. Hence, "one-way".

Of course the simple answer, as already given by e-sushi is: Because they are constructed so that it's impossible. :)

Elias
  • 4,933
  • 1
  • 16
  • 32
9

We don't actually know if we cannot reverse hashes. There is no mathematical proof that reversing hashes is hard. Reversing hashes is in FNP, therefore any such proof would be a strong result about hardness of NP (hardness of FNP and NP is trivially linked).

The practical impossibility of reversing hashes (the cryptographically strong ones) stems from the algorithms being designed to remove known (and hypothetical) weaknesses that would make it easy to reverse them.

1

This answer was merged here, and targets a slightly different question:

if a hashing algorithm is used to hash a password, and (using the same hash alg such as md5) the same hash will always be produced from the same password, why can’t you just reverse the hashing algorithm to crack the hash?

The premise in this reasoning is that knowing the output $y$ of some computable function $F$ for some input $x$ we know nothing about, it's possible to find that input $x$. This premise is unwarranted, and wrong.

For a start, there might be many different inputs $x$ that yield the same output $y$, making it very unlikely to find the correct input $x$ from $y$. In this case we must limit our hope to at best: given the output $y$ for unknown input $x$, find some input $x'$ with $y=F(x')$. One example function $F$ where that's easy is the one that repeats the last byte of it's input 16 times.

More importantly, we know how to build efficiently computable functions that we don't know how to efficiently reverse. One example for 16-byte input $x$ and output $y$ is the function $F$ defined as : $$x\mapsto y=F(x)=E(x)\oplus x$$ where $E$ is AES-256 encryption with the all-zero key, and $\oplus$ is bitwise eXclusive OR. Picture $E$ as some public, rather arbitrary bijection of 16-byte strings, that we know to efficiently compute in both the forward and backward directions.

$E$ and it's inverse $E^{-1}$ are easy to compute, but this is of no direct help to invert $F$, because given $y$ we know neither $x$ nor $E(x)$, thus we do not know on what to apply $E$ or $E^{-1}$.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
-7
  1. Clarification: The question has a flawed assumption. I mistakenly though in addressing the details of that flaw my answer was making that obvious. Just to make it really clear:

    YOU CAN REVERSE SOME HASHES!!! BEING ONE-WAY IS NOT A REQUIREMENT OR CONCERN OF HASH FUNCTIONS!!! AND EVEN FOR THE SUBSET OF CRYPTOGRAPHIC HASHES IT ISN'T 100% GUARANTEED IRREVERSIBLE!

    You mileage will vary depending on circumstances, but given the right circumstances you can reverse any hash with relative ease (the key in cryptography is denying the person trying to reverse your hashes sufficient information to do so).

  2. Clarification - The question possibly was talking about cryptographic hashes, but did not say so. Just like all dogs are animals, but not all animals are dogs, cryptographic hashes are a subset of hashes, and there are many hashes in general use that are not appropriate to use as cryptographic hashes.

    I can think in my head of a number of ways of making a useful hash function that would not be hard to reverse. You could also use the private/public ssh key pairs to make a hash that is reversible if you have the other key, but not otherwise.

The original answer goes on to explain what "hash function" really means (and being one way / irreversible is not a requirement for a hash function):

Hash in computer science was originally used for "Hash" tables and was concerned with distributing a non uniformly spread input set across a limited output set for efficient indexing. They are generally simplistic for fast execution, and are typically not cryptographically strong.

(A moderately dumb hash function can be as simple as taking the input as a number and getting the modulus of it using a prime number - this means all of the input bits affect the output result, but one possible input value is simply the hash as a bit string with zero padding on the left out to a byte boundary).

wikipedia has a useful short article: https://en.wikipedia.org/wiki/Cryptographic_hash_function

A cryptographic hash function is a special class of hash function that has certain properties which make it suitable for use in cryptography.

Useful reading - it goes into more detail of the reversibility of hash functions intended to be hard to reverse. (Nothing is irreversible given sufficient time and processing power - you could just iterate through all possibilities - you just try and make the effort harder then its worth doing)

see also https://security.stackexchange.com/questions/63052/reversible-hash-function "Is there any reversible hash function?"

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
iheggie
  • 95
  • 2