6

Let $F$ be a secure PRF and $H$ a universal hash function.

How can I exhibit a pair $(F,H)$ whose composition $$S'((k_1, k_2), m) = F(k_2, H(k_1,m))$$

is an insecure MAC (or an insecure PRF, since a MAC can be defined as a PRF) ?


I guess that, in order to find a pair $(F,H)$, the trick would be creating some $H$ whose image space is short enough so to easily find a collision, but I'm not good at finding an example of such functions, the books I read always try to abstract these functions.

Daniel
  • 457
  • 4
  • 15

2 Answers2

6

Without breaking $F$, you can't: $S'$ is a PRF with almost the same security as $F$.

Let $k_1$ and $k_2$ be uniform random keys. Let $F$ be a PRF, with advantage $$\operatorname{Adv}^{\operatorname{PRF}}_F(A) = \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert$$ for any distinguisher $A$, where $f$ is a uniform random function with the domain and codomain of $F$. Let $H$ be an $\varepsilon$-almost universal hash family, so that $\Pr[H_{k_1}(x) = H_{k_1}(y)] \leq \varepsilon$ for any $x \ne y$. (Without qualification, $\varepsilon = 1/|T|$ where $T$ is the codomain of $H$.)

Define $$S'_{k_1,k_2}(m) = F_{k_2}(H_{k_1}(m)).$$

Fix any PRF-distinguisher $A'$ for $S'$ making $q$ queries, and let $U$ be a uniform random function with the domain and codomain of $S'$. We will bound the advantage of $A'$ at distinguishing $S'$ in terms of the advantage of another algorithm $A$ at distinguishing $F$ and the collision probability $\varepsilon$ of $H$: \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} where $A$ is a PRF-distinguisher for $F$. As long as $\operatorname{Adv}^{\operatorname{PRF}}_F(A)$ is small and $q$ is not too large, $\operatorname{Adv}^{\operatorname{PRF}}_{S'}(A')$ is small too.

We will do this by the triangle inequality with the intermediate probability $\Pr[A'(f \circ H_{k_1}) = 1]$ that $A'$ returns 1 on a variant $f \circ H_{k_1}$ of $S'_{k_1,k_2} = F_{k_2} \circ H_{k_1}$, where a uniform random $f$ has been substituted for $F_{k_2}$.

  1. Define the PRF-distinguisher $A$ for $F$ by $A(\mathcal O) = A'(\mathcal O \circ H_{k_1})$. Then \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_F(A) &= \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert \\ &= \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert. \end{align*} If $A'$ is a good distinguisher for $S'$, we will find that $A$ is a good distinguisher for $F$, unless $A'$ just got lucky finding collisions in $H$.

  2. Now consider the $q$ queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ for the oracle $f \circ H_{k_1}$.

    From queries to $H_{k_1}$ alone, of which we assume only the weak property of collision probabilities on two distinct inputs, an adversary could find a collision among three inputs with high probability—e.g., in a polynomial evaluation MAC $M_{r,s}(m) = s + \sum_{i=1}^{|m|} m_i r^{|m| - i + 1}$ the adversary could trivially recover the keys $r$ and $s$ from two distinct queries and find arbitrarily many collisions with probability 1 after that.

    But since $f$ is a uniform random function, the only information $A'$ can learn from oracle access to $f \circ H_{k_1}$ is whether the queries collide in one of $H_{k_1}$ or $f$, or definitely do not collide in either. The adversary can adaptively act on the information that queries might collide only if a collision actually occurs in $H_{k_1}$, which happens with probability at most $\varepsilon$ for any pair of inputs submitted. Thus, to study $\Pr[A'(f \circ H_{k_1}) = 1]$, it suffices to set a bound on the probability that there is a collision at all.

    Among the queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ to $f \circ H_{k_1}$, the event $C$ of a collision in $H_{k_1}$ has probability \begin{multline*} \Pr[C] = \Pr[\exists i < j\colon H_{k_1}(x_i) = H_{k_1}(x_j)] \\ \leq \sum_{i<j} \Pr[H_{k_1}(x_i) = H_{k_1}(x_j)] \leq \sum_{i<j} \varepsilon = \binom{q}{2} \varepsilon, \end{multline*} In the event $\lnot C$ that the queries do not collide in $H_{k_1}$, the distribution of each $f(H_{k_1}(x_i))$ is independent uniform random, identical to the distribution of $U(x_i)$. Hence necessarily $\Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] = \Pr[A'(U) = 1]$, so that \begin{align*} \Pr[A'(f \circ H_{k_1}) = 1] &= \Pr[A'(f \circ H_{k_1}) = 1 \mid C]\,\Pr[C] \\ &\quad + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C]\,\Pr[\lnot C] \\ &\leq \Pr[C] + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] \\ &\leq \binom{q}{2} \varepsilon + \Pr[A'(U) = 1], \end{align*} and thus $\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1] \leq \binom{q}{2} \varepsilon$.

  3. Summing up, \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert \\ &\quad + \lvert\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} QED.


This follows the structure of the proof of Lemma 3.3 in:

Shay Gueron and Yehuda Lindell, ‘GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte’, in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 109–119

Variants of the theorem appear in many earlier papers, including the MDx-MAC paper that preceded the creation of HMAC, and the HMAC/NMAC security papers.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
2

My reading is that one can't exhibit the counter example asked for the definition of universal hash function in comment, when that's read as stating that $H_{k_1}: m\to H(k_1,m)$ is collision-resistant for fixed $k_1$, including random and public.

That follows from the following proposition, and the remark that turning $k_1$ from public to secret can't harm security.

Proposition: Applying a secure PRF $F_{k_2}: h\mapsto F(k_2,h)$ with random secret constant $k_2$ to the output of a public collision-resistant function $H$ yields a secure MAC (only at worst slightly less secure than the weakest of $F$ and $H$).

That proposition holds because distinguishing $F_{k_2}(H(m_i))$ from random, for random secret $k_2$ and chosen distinct messages $m_i$, requires breaking the indistinguishability of $F_{k_2}$ or the collision-resistance of the public function $H$. Proof sketch of that: for hypothetical distinct messages $m_i$ allowing to distinguish $F_{k_2}(H(m_i))$ from random, if there is a collision among the $h_i=H(m_i)$, that exhibits a pair of $h_i$ breaking the collision-resistance of $H$; otherwise, we can distinguish the $F(h_i)$ from random for chosen distinct $h_i$ that we can exhibit from the hypothetized $m_i$ (since $H$ is public), thus breaking the indistinguishability of $F$.


As apparent from the many revisions and convoluted argument surrounding $k_1$, I'm struggling quite a bit on that one, especially when I use the more formal definition of (not-necessarilly-strongly) universal hash function: $H:\mathcal K\times\mathcal M\to\mathcal T$ is a family of universal hash functions when $$\forall(m,m')\in\mathcal M^2,\quad m\ne m'\implies\mathsf{Pr}_{k\in\mathcal K}\Big[H(k,m)=H(k,m')\Big]=\frac1{|\mathcal T|}$$

fgrieu
  • 149,326
  • 13
  • 324
  • 622