XOR sum versus additive sum for parallelizable hash

Question

Suppose we have:

A function $F: \{0, 1\}^{\ell_c} \times \{0, 1\}^{\ell_x} \rightarrow \{0, 1\}^{\ell_y}$.
A function $G: \{0, 1\}^{\ell_y} \rightarrow \{0, 1\}^{n}$.
A sequence of data blocks $x_0, \ldots, x_{m-1}$.

If the hash is $h = G(y)$, then is there any difference in security between $$y = \bigoplus\limits_{i=0}^{m-1} F(i, x_i)$$ versus $$y = \sum\limits_{i=0}^{m-1} F(i, x_i)$$ (the latter is using modular addition)? I read an answer somewhere on this site about the use of XOR sums being trivial to produce collisions for whereas addition sums are significantly harder.

D.W. · Answer 1 · 2025-02-21T08:07:58.850

The scheme with XOR is insecure, because of the Gaussian elimination attacks explained by others.

The scheme with addition is known as AdHash, and has been proposed by Bellare et al. It has been conjectured that this has 80-bit security if the modulus is approximately 1600 bits long or longer. See https://crypto.stackexchange.com/a/5231/351. With a small modulus, like $2^{32}$, it is insecure. (See Section 4, paragraphs labelled "AdHash", in "A generalized birthday problem" by Wagner, for the dependence on the modulus size.)

You might also be interested in MuHash, which appears to provide stronger security (i.e., a smaller modulus appears to suffice).

poncho · Answer 2 · 2025-02-15T04:47:45.417

It indeed appears to be the case.

In the xor case, that is:

$$y = \bigoplus\limits_{i=0}^{m-1} F(i, x_i)$$

Gaussian Elimination (which is an $O(n^3)$ algorithm) can be used to find collisions, preimages (and including the checking of the existence of such). This works no matter what the outputs of the $F$ functions are (and increasing the size of the bitstrings only increases $n$). It does assume that $m$ is moderately large (but then, if we're considering a hash of a potentially large input, this is the case).

On the other hand, in the addition case:

$$y = \sum\limits_{i=0}^{m-1} F(i, x_i)$$

It turns out to be an NP-hard problem (!). That is, if you can solve this quickly in a generic way (where "quickly" means "in polynomial time") for any instance of this problem, you can solve any problem within NP quickly. Hence, we do not believe that there is a fast (polynomial time) algorithm that works on all inputs, and that any generic algorithm would take exponential time in the worse case.

That said, the proof of NP-hardness involves very specific $F$ functions (and a very large modulus), and so might not reflect the actual hardness in the cases we are interested in. On the other hand, at first glance, it doesn't appear likely that random inputs (such as a real-world $F$ implementation would approximate) would lend themselves to any faster time algorithm than the specific inputs used in the proof. Hence, our guess is that we can't do better than exponential time in the cases we're interested in.

Now, Wagner's algorithms are still exponential, but reduce the exponent significantly. This is a great help in practice (and must certainly be taken into account when sizing the problem to be at a certain hardness), however it'll still be slower than the polynomial time we got in the xor case.

kodlu · Answer 3 · 2025-02-15T03:17:05.397

I don't believe that's the case. The basic $k-$Sum algorithm originally due to Wagner (Crypto'2002, A generalized birthday attack, see here) works on any group operation on a finite group.

For the XOR case the filtering step in the two way merges of the tree accepts a pair of candidate vectors if their XOR sum has (say) $k$ leading zeroes. To be specific, let $\ell_y$ be the bitlength of the hashes, and the related sum be an $m-$sum as in your example.

Wagner's algorithm fixes $k=\frac{\ell_y}{(1+\lceil \log m \rceil)}$ and uses that in the filtering step as described above.

If you think of the integers the bit patterns represent, two candidate vectors are accepted if their highest $k$ bits are zero, which is the same as the corresponding integers being $\leq 2^{\ell_y-k}.$

On the modular addition case you can just accept pairs that fall into a subset of the group via a threshold check. So, with modular addition, you can accept a pair if and only if the modular sum is in the interval $[0,\leq \lceil 2^{-k} N\rceil]$ where $N=2^{\ell_y},$ to compare problems with the same size.

Wagner states this generalization exists in his paper. He specifically even mentions the group $(\mathbb{Z}/2^n\mathbb{Z},+)$ which you are asking about on page 6 of the linked paper.

XOR sum versus additive sum for parallelizable hash

3 Answers3