10

I want a way to 'scramble' strings (like words, names) in a predictable way but I don't ever want to (myself or anyone else) be able to 'unscramble' the strings.

I found that perhaps what I am looking for is a cryptographically secure hash.

But can it do these things

1 - preserve the length of the original string?

2 - preserve the format (keep numbers as numbers, or characters within certain ranges)

3 - Collision resistant

4 - Reusable for a finite length of time - like over a few minutes to a few months (i need to reuse the algorithm over and over so that a string 'scrambled' today will come out the exact same way in the future using the same algorithm)

EDIT: The reason why I use the term cryptographically secure hash is that it is supposed to be 'practically impossible' to invert. It is the impossible to invert feature that I am looking for. I do not care to recover the meaning or content of the original message. i.e. For example using encryption, I do not want someone to be able to determine a 'key' by analyzing the output message, or find patterns in the output message that would enable them to determine the input message.

I think the closest to what I seek is probably Format Preserving Encryption, however, the concern is the fact that the encryption is reversible. I'm curious if it can be made irreversible. Then a hash came to mind.

erotavlas
  • 507
  • 3
  • 14

1 Answers1

8

Cryptographically secure hashes usually work on bitstrings of arbitrary length and output a fixed length bitstring. The secure part is being collision resistant and preimage resistant, so that you have a practical oneway function, and those are the properties you want for "scrambling".

As fgrieu psted in the comments, one easy way to do this is to utilize an encryption scheme, and use the plaintext for the key as well. That means, in order to decrypt the ciphertext you would need the plaintext. Formally:

$$F(p):= E_p(p)$$ with $E_k(p)$ denoting an encryption method with key $p$. If you want to utilize format preserving encryption, you might have to make adaptions how the plaintext actually is used for the "key input".

One more remark: Usually the key should be generated uniform randomly, and depending on your format, this might not be true. Therefore, if you use a normal hash function on the input $p$, then you should get a uniformly distributed hash value, and if you use this as key, you should be fine. Theoretic remark: In the random oracle model this should work just fine. So, we can adapt this to:

$$F(p):= \hat{E}_{H(p)}(p)$$

with a format preserving encryption function $\hat{E}_k(p)$ and a cryptographically secure hash function $H(p)$, where the key length $|k|$ is equal or less than the length of the output of the hash function $|H(p)|$.

tylo
  • 12,864
  • 26
  • 40