2

Trying to understand the concept of a Universal Hashing function.

From Introduction of Algorithms by Cormen, Rivest et al

We now define the hash function $h_ab$ for any $a \in Z^*_p$ and any $b \in Z_p$ using a linear transformation followed by reductions modulo $p$ and then modulo $m$:

$h_{ab}(d) = ((ad + b) \mod p) \mod m)$

I understand the above from an algorithmic point of view for use in a hash-table. i.e. if you have some data, which you need to hash to get a bucket/index number, you pick a random a & b from the sets & evaluate the function $h_ab$ at d. There is only a single input to the function once you picked the a & b.

But instead if we are using for a getting a hash in cryptography (for e.g. for using in a one-time MAC like Carter-Wegman MAC), then we now have 2 inputs - a key (a cryptographic key rather than a key to be hashed for a hash table) & also the message to hashed. So how does it work here? For simplicity, if I assume a single block of message, then how do I use this universal function to evaluate the hash?

TL;DR - a Universal Hash function has only one input when used with Hash Tables, but 2 inputs when used in Cryptography for a one-time-MAC. What's the relation between these 2 kinds of hash functions. All places (Textbooks, Wikipedia etc) I look at, universal hashing is defined in terms of a single input functions, so how does it differ for functions taking 2 inputs)?

user93353
  • 2,348
  • 3
  • 28
  • 49

1 Answers1

2

For simplicity, if I assume a single block of message, then how do I use this universal function to evaluate the hash?

We shift some parameters from being part of the concrete function instance into the key.

For your example of $h_{ab}(d)=ad+b\mod p\mod m$ the following mapping is usual:

  • $d$ is the input to be hashed
  • $a$ is the secret key of the instantiation, reusable across multiple invocations
  • $b$ is the whitening value, it must be freshly random and hidden from adversaries for each invocation
  • $p,m$ are system parameters that depend on in which field you want to work and into which you have to encode your result, e.g. you could pick $p$ to be a relatively large prime and $m=2^k$ for some $k$.

Alternatively, if you insist on having only two non-parameter inputs, you can also force $a$ to have the same usage constraints as $b$ and declare them both "the key".

SEJPM
  • 46,697
  • 9
  • 103
  • 214