3

What is the best way to generate a unique 64 bit numeric database key from a string?

For example: if the string is a URL, you could generate a 64 bit key using SipHash or by truncating a SHA256 hash to 64 bits, but collisions are likely at $2^{32}$ (~ 4 billion) entries.

However, if instead you truncated the hash to (64 bits - X bits), and concatenated the resulting hash with the first X bits of the input string, you would reduce the chance of collisions since two similar strings are unlikely to produce the same hash code:

key_bits = 64
x_bits = 8    
hash_code = sha256(url_string)    
key = truncate(hash_code, key_bits - x_bits) + truncate(url_string, x_bits)

Is this correct? Is there a better way?

otus
  • 32,462
  • 5
  • 75
  • 167
espeed
  • 133
  • 1
  • 6

2 Answers2

5

Unfortunately, without some additional assumptions about the strings (e.g. that the first 8 bytes are unique), there is no way to reduce the chance of collision below the usual chance levels. If you need unique 64-bit ids, the options are:

  1. Do a database lookup when generating keys and pick another key if it is already in use (you state that you want to avoid this).

  2. Ignore duplicate keys, e.g. set things up so that only one of the two entries is kept. As long as you don't go much above $2^{32}$ entries, the expected number of collisions will be very low, and depending on your task, it may be acceptable to ignore a small fraction of the entries.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
jbms
  • 360
  • 2
  • 8
1

However, if instead you truncated the hash to (64 bits - X bits), and concatenated the resulting hash with the first X bits of the input string, you'd reduce the chance of collision since two similar strings are unlikely to produce the same hash code:

For a good cryptographic hash a given similar string is just as likely to have a hash collision as a very different string.

If you are limited to 64-bit keys, the best you can do is take a strong hash – or a key-derivation function if the string is guessable and that's an attack – and hope there are no collisions/have a backup solution in case there is.

otus
  • 32,462
  • 5
  • 75
  • 167