Proof that MAC and hash composition is insecure

Question

Let $F$ be a secure PRF and $H$ a universal hash function.

How can I exhibit a pair $(F,H)$ whose composition $$S'((k_1, k_2), m) = F(k_2, H(k_1,m))$$

is an insecure MAC (or an insecure PRF, since a MAC can be defined as a PRF) ?

I guess that, in order to find a pair $(F,H)$, the trick would be creating some $H$ whose image space is short enough so to easily find a collision, but I'm not good at finding an example of such functions, the books I read always try to abstract these functions.

Squeamish Ossifrage · Accepted Answer · 2019-11-06T02:27:07.907

Without breaking $F$, you can't: $S'$ is a PRF with almost the same security as $F$.

Let $k_1$ and $k_2$ be uniform random keys. Let $F$ be a PRF, with advantage $$\operatorname{Adv}^{\operatorname{PRF}}_F(A) = \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert$$ for any distinguisher $A$, where $f$ is a uniform random function with the domain and codomain of $F$. Let $H$ be an $\varepsilon$-almost universal hash family, so that $\Pr[H_{k_1}(x) = H_{k_1}(y)] \leq \varepsilon$ for any $x \ne y$. (Without qualification, $\varepsilon = 1/|T|$ where $T$ is the codomain of $H$.)

Define $$S'_{k_1,k_2}(m) = F_{k_2}(H_{k_1}(m)).$$

Fix any PRF-distinguisher $A'$ for $S'$ making $q$ queries, and let $U$ be a uniform random function with the domain and codomain of $S'$. We will bound the advantage of $A'$ at distinguishing $S'$ in terms of the advantage of another algorithm $A$ at distinguishing $F$ and the collision probability $\varepsilon$ of $H$: \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} where $A$ is a PRF-distinguisher for $F$. As long as $\operatorname{Adv}^{\operatorname{PRF}}_F(A)$ is small and $q$ is not too large, $\operatorname{Adv}^{\operatorname{PRF}}_{S'}(A')$ is small too.

We will do this by the triangle inequality with the intermediate probability $\Pr[A'(f \circ H_{k_1}) = 1]$ that $A'$ returns 1 on a variant $f \circ H_{k_1}$ of $S'_{k_1,k_2} = F_{k_2} \circ H_{k_1}$, where a uniform random $f$ has been substituted for $F_{k_2}$.

Define the PRF-distinguisher $A$ for $F$ by $A(\mathcal O) = A'(\mathcal O \circ H_{k_1})$. Then \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_F(A) &= \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert \\ &= \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert. \end{align*} If $A'$ is a good distinguisher for $S'$, we will find that $A$ is a good distinguisher for $F$, unless $A'$ just got lucky finding collisions in $H$.
Now consider the $q$ queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ for the oracle $f \circ H_{k_1}$.

From queries to $H_{k_1}$ alone, of which we assume only the weak property of collision probabilities on two distinct inputs, an adversary could find a collision among three inputs with high probability—e.g., in a polynomial evaluation MAC $M_{r,s}(m) = s + \sum_{i=1}^{|m|} m_i r^{|m| - i + 1}$ the adversary could trivially recover the keys $r$ and $s$ from two distinct queries and find arbitrarily many collisions with probability 1 after that.

But since $f$ is a uniform random function, the only information $A'$ can learn from oracle access to $f \circ H_{k_1}$ is whether the queries collide in one of $H_{k_1}$ or $f$, or definitely do not collide in either. The adversary can adaptively act on the information that queries might collide only if a collision actually occurs in $H_{k_1}$, which happens with probability at most $\varepsilon$ for any pair of inputs submitted. Thus, to study $\Pr[A'(f \circ H_{k_1}) = 1]$, it suffices to set a bound on the probability that there is a collision at all.

Among the queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ to $f \circ H_{k_1}$, the event $C$ of a collision in $H_{k_1}$ has probability \begin{multline*} \Pr[C] = \Pr[\exists i < j\colon H_{k_1}(x_i) = H_{k_1}(x_j)] \\ \leq \sum_{i<j} \Pr[H_{k_1}(x_i) = H_{k_1}(x_j)] \leq \sum_{i<j} \varepsilon = \binom{q}{2} \varepsilon, \end{multline*} In the event $\lnot C$ that the queries do not collide in $H_{k_1}$, the distribution of each $f(H_{k_1}(x_i))$ is independent uniform random, identical to the distribution of $U(x_i)$. Hence necessarily $\Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] = \Pr[A'(U) = 1]$, so that \begin{align*} \Pr[A'(f \circ H_{k_1}) = 1] &= \Pr[A'(f \circ H_{k_1}) = 1 \mid C]\,\Pr[C] \\ &\quad + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C]\,\Pr[\lnot C] \\ &\leq \Pr[C] + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] \\ &\leq \binom{q}{2} \varepsilon + \Pr[A'(U) = 1], \end{align*} and thus $\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1] \leq \binom{q}{2} \varepsilon$.
Summing up, \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert \\ &\quad + \lvert\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} QED.

_{This follows the structure of the proof of Lemma 3.3 in:}

_{Shay Gueron and Yehuda Lindell, ‘GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte’, in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 109–119}

_{Variants of the theorem appear in many earlier papers, including the MDx-MAC paper that preceded the creation of HMAC, and the HMAC/NMAC security papers.}

fgrieu · Answer 2 · 2018-05-15T12:16:51.150

My reading is that one can't exhibit the counter example asked for the definition of universal hash function in comment, when that's read as stating that $H_{k_1}: m\to H(k_1,m)$ is collision-resistant for fixed $k_1$, including random and public.

That follows from the following proposition, and the remark that turning $k_1$ from public to secret can't harm security.

Proposition: Applying a secure PRF $F_{k_2}: h\mapsto F(k_2,h)$ with random secret constant $k_2$ to the output of a public collision-resistant function $H$ yields a secure MAC (only at worst slightly less secure than the weakest of $F$ and $H$).

That proposition holds because distinguishing $F_{k_2}(H(m_i))$ from random, for random secret $k_2$ and chosen distinct messages $m_i$, requires breaking the indistinguishability of $F_{k_2}$ or the collision-resistance of the public function $H$. Proof sketch of that: for hypothetical distinct messages $m_i$ allowing to distinguish $F_{k_2}(H(m_i))$ from random, if there is a collision among the $h_i=H(m_i)$, that exhibits a pair of $h_i$ breaking the collision-resistance of $H$; otherwise, we can distinguish the $F(h_i)$ from random for chosen distinct $h_i$ that we can exhibit from the hypothetized $m_i$ (since $H$ is public), thus breaking the indistinguishability of $F$.

As apparent from the many revisions and convoluted argument surrounding $k_1$, I'm struggling quite a bit on that one, especially when I use the more formal definition of (not-necessarilly-strongly) universal hash function: $H:\mathcal K\times\mathcal M\to\mathcal T$ is a family of universal hash functions when $$\forall(m,m')\in\mathcal M^2,\quad m\ne m'\implies\mathsf{Pr}_{k\in\mathcal K}\Big[H(k,m)=H(k,m')\Big]=\frac1{|\mathcal T|}$$

Proof that MAC and hash composition is insecure

2 Answers2

Linked