39

I was wondering if there's any string that has a hash equal to itself, so that – when using any (none specific) hash function – the hash would be equal to that string?

so that:

hash(x) = x

Note that this is not an assignment or anything. I’m just curious and couldn't find any specific answer or reference. And I’m not sure how to go and prove/disprove that to myself!

Mostafa Berg
  • 493
  • 1
  • 4
  • 8

4 Answers4

43

I restrict to hash functions $H$ with an output of some fixed size $n\ge1$ bit(s), accepting as input some strings, including all $n$-bit strings; MD5 (resp. SHA-1, SHA-256) is an example of such function for $n=128$ (resp. $n=160$, $n=256$).

Whether there exists a solution to $H(x)=x$ depends on the particular hash function. If $H$ is a random function (as MD5, SHA-1, and SHA-256 aim to be), the answer is YES with odds next to $63.2\%$ for practical values of $n$.

More precisely: $H(x)=x$ can hold only if $x$ has exactly $n$ bits. There are $2^n$ values of $x$ that satisfy the later condition, and restricting $H$ to such $x$ there are $(2^n)^{(2^n)}$ different $H$ functions, of which $(2^n-1)^{(2^n)}$ such that $H(x)=x$ has no solution. Therefore, if we choose one $H$ uniformly randomly, odds are exactly $1-{(2^n-1)^{(2^n)}\over(2^n)^{(2^n)}}=1-(1-2^{-n})^{(2^n)}$ that we picked $H$ such that $H(x)=x$ has a solution. As $n$ increases, this converges very fast to $1-1/e\approx0.632$ (where $e\approx2.718$ is the base of the natural logarithm).

This does not tell if MD5 has the property that there exists a solution to $\operatorname{MD5}(x)=x$ (which would be a 128-bit bitstring $x$). The best we can say is that it likely holds, with odds about to 63%, but determining if the assertion is true or false is beyond our current computing power (the best method we have is exhaustive search, and if the answer is no it would require $2^{128}$ hashes; otherwise it is still likely to require over $2^{126}$ hashes, which is beyond reach).

PHP specific: if md5($string) === $string had some solution, that would be a 32-character string of hexadecimal lowercase characters; we are not hashing the same $2^{128}$ candidates as above so the question is not equivalent, but the reasoning can be adapted, and again the best we can say is that it is likely there's a solution, with odds about 63%.
Further, the original question asked if there is a string such that md5($string) == $string. To answer this, we must take into account how the == operator works in PHP due to type juggling (it holds that "0042" == "42", and "20e2" == " +002000"). It is overwhelmingly likely that there is a solution (just consider that among the $2^{200}$ strings consisting of 200 space or tab and an additional final 0, we expect about $31\cdot2^{72}$ hash to one of "00000000000000000000000000000000", "000000000000000000000000000000e0" .. "0e000000000000000000000000000000"); however we can't exhibit one.


It is easy to define a hash function $H$ such that $H(x)=x$ has no solution: for example, define $$H(x)=\begin{cases}x\oplus1&\text{if }\operatorname{MD5}(x)=x\\\operatorname{MD5}(x)&\text{otherwise}\end{cases}$$


It is also easy to define a hash function $H$ such that $H(x)=x$ has at least one solution: for example, choose some arbitrary 128-bit constant like $k=\text{af5d2bc6c9181f76f3161f43f41f6aeb}$, and define $$H(x)=\begin{cases}k&\text{if }x=k\\\operatorname{MD5}(x)&\text{otherwise}\end{cases}$$


There can be no $x$ such that for all possible hash functions $H$, $H(x)=x$.

Proof by contraposition: assume there is such $x$, a function $H$ with $x$ having that property, and consider the function $\tilde H$ defined by $\tilde H(x)=H(x)\oplus1$.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
11

Yes, you can create many such functions.

For instance, lets build such a function based on SHA512. Generate some random value $m_0$ and generate a hash of it. It is important, because there is no guarantee that every 512-bit number has a pre-image.

So, let $h_0 = \operatorname{SHA512}(m_0)$. After hash generated, throw $m_0$ away. Technically you can do that as follows. Create a program that generates $m_0$ as a stream. It should generate a 64-byte block of random values, proceed with hashing, override these 64-bytes with a new block, proceed with hashing, etc. Thus, the program will at no time have the full value of $m_0$.

Now calculate its hash: $h_1 = \operatorname{SHA512}(h_0)$. With very high probability $h_1$ will differ from $h_0$.

Now define a new hash function as follows: \begin{align} \operatorname{hash}(x) &= (\operatorname{SHA512}(x) + h_0 - h_1 + 2^{512}) \bmod 2^{512}\end{align}

Calculate this hash function for $h_0$:

\begin{align} \operatorname{hash}(h_0) &= (\operatorname{SHA512}(h_0) + h_0 - h_1 + 2^{512}) \bmod 2^{512} \\ &= (h_1 + h_0 - h_1 + 2^{512}) \bmod 2^{512} \\ &= (h_0 + 2^{512}) \bmod 2^{512} \\ &= h_0 \end{align}

Thus \begin{align} \operatorname{hash}(h_0) = h_0 \end{align}

Our transformation is actually a rotation of a 512-bit number. This operation does not change the cryptographic properties. This means that cryptographic properties of our hash function are the same as of SHA512:

  • Pre-image resistance: Except of a single value $h_0$ that is a pre-image of itself, for any other value finding a pre-image is as hard as for SHA512
  • 2nd pre-image resistance: For any input and hash the complexity of finding of another input that gives the same hash is as hard as for SHA512. This holds also for $h_0$, because we don't know what was $m_0$.
  • Collision resistance: For any other hashes the complexity of finding collisions is the same as complexity for SHA512. This holds also for $h_0$, because we don't know what was $m_0$.

In this manner you can create other hash functions with other fixed points.

SEJPM
  • 46,697
  • 9
  • 103
  • 214
mentallurg
  • 2,661
  • 1
  • 17
  • 24
-2

With any hash function as you ask:

No.

If you write a Hash-function wich calculates the Hash-value in some way, and then append a t to the result (because you like the letter), then no matter what your input string is, the hash result will be different from your input.

For specific hash functions:

Sure, it could be; especially with a "bad" hash-function like "first 3 letters of the String".


Since a clarification (see comment) pointed out that the question only targets mature hash functions, please refer to user2351586's answer for MD5 specifics.

-2

This won’t be possible, because the MD5 algorithm has left bit rotation by $s$ places, where $s$ varies for each operation. So, the MD5 of a string can never be the same string.

Refer the Algorithm section of Wikipedia’s MD5 article.

Every hashing algorithm has its basis in block ciphers and there is byte-shifting involved as a core part. So, it won’t be possible that a plaintext $m$ is equal to ciphertext $c$.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240