0

Can we define one finite set of input strings for a SHA-3 hash (or SHAKE XOF) function so that the collision probability is arguably lower than for a random oracle, with a definition of the set making no reference to the function? If so, what's an example with the ratio of function's to oracle's collision probabilities particularly low?

The definition of the set must be without constants obtained with help of evaluations of the function, or it's internal constants (that would be considered a reference to the function). There must be a demonstration or heuristic argument of the desired property using only considerations about the (sponge) structure of the function and it's internal parameters (rate, capacity…), rather than evaluation.

The collision probability is for two elements of the set drawn uniformly at random without replacement, or another collision probability to be proposed.

I understand that no experimental verification using the function as a black box is possible, for it would be a break.


The question is inspired by this comment, that suggests $\operatorname{SHAKE}(y\mathbin\Vert z)$ has less collisions than expected for a random oracle if the length of $y$ is less than the rate, and some other conditions.

fgrieu
  • 149,326
  • 13
  • 324
  • 622

1 Answers1

3

Well, since I made the comment that fgrieu responded to, I suppose I should be the one to answer.

I was using an intuitive notion; when we look at it analytically, there is indeed an (exceeding small) lower collision probability for SHAKE (although not "demonstrably lower").

Background on how SHAKE works internally for short (shorter than the "rate"; 168 bytes for SHAKE-128) inputs: SHAKE places the input into a 1600 bit array (and having fixed settings for the rest of the bits). It then performs a permutation (which we will model as a random permutation) on that 1600 bit array. Then, for the first 168 bytes of output (for SHAKE-128), it reads out bits from the array (and if more output is requested, it'll do another permutation, and output the next 168 bytes from there).

So, if we consider two distinct short inputs $A$ and $B$, the state after the permutation will be random, except that we know they are different. That is, if we consider the $A$ state as fixed, the $B$ state will be in one of $2^{1600}-1$ values. Of those $2^{1600}-1$ values, there will be $2^{256}-1$ values where the first 1344 bits (168 bytes) will be the same. Hence, the probability that the first 168 output bytes for both $A$ and $B$ will be the same is:

$$\frac{2^{256} - 1}{2^{1600}-1}$$

This value is (slightly) smaller than the $2^{-1344} = 2^{256} / 2^{1600}$ value we get for a random oracle.

poncho
  • 154,064
  • 12
  • 239
  • 382