A verifier wants to ensure, with only little exchange of data with other systems, that a large block of data $M$ that the verifier holds is also available to some other system(s). It is not an objective to keep $M$ secret or ensure that the other systems hold $M$ in some specified form (it is fine that $M$ is stored compressed or distributed in the other systems).
This seems ideal grounds for a protocol as follows, based on my limited understanding of Carter-Wegman hashes:
- verifier chooses a random primitive polynomial $P$ of degree $k$ with binary coefficients, and publishes/broadcasts it as a message of $k-1$ bits (the coefficients of degree $k$ and $0$ are known to be $1$ and need not be sent, and we might compact this a little further by broadcasting the seed for some CSPRNG used to generate $P$);
- verifier computes the remainder of the polynomial $M$ (with the bits of $M$ defining the binary coefficients of the polynomial) by polynomial $P$ (in other words, the verifier computes a CRC of message $M$ per polynomial $P$);
- verifier receives a message and is content if it is of $k$ bits that match the remainder that is computed. Of course, that message was computed by the other system(s) proving they collectively hold $M$.
Questions:
- Does this protocol meet the stated objective? Can we prove it under some appropriate definition of security, with quantitative bound as a function of $k$, and the number $n$ of iterations made for the same $M$ (and perhaps of the size of the broadcast message if that can be made much lower than $k$, and of the size, $m$ of $M$ if that matters)?
- What if we replace the condition that polynomial $P$ is primitive by some weaker condition?
- What speed (in bit/second) is possible on an actual CPU, like a modern x86-64 or ARM CPU, with comparison to other means (perhaps, HMAC-MD5 or CBC-MAC-AES with the challenge as key)?
Late update: it appears the scheme is Rabin fingerprinting, or closely related to that (a difference is that in Rabin fingerprinting, the polynomial is chosen irreducible, not necessarily primitive; and that Rabin fingerprints use proper padding, which I forgot: my technique fails to prove that the number of initial 0 bits is known by the provers).