6

The definition I have is:

Two probability ensembles $X = \{X_n\}_{n \in \mathbf{N}}$ and $X = \{Y_n\}_{n \in \mathbf{N}}$ are computationally indistinguishable if for every probabilistic polynomial-time algorithm $D$, every positive polynomial $p(\cdot)$, and all sufficiently large $n$ we have $$ |P(D(X_n, 1^n) = 1) - P(D(Y_n, 1^n) = 1) | < \frac{1}{p(n)}. $$

I'm having trouble digesting some ideas and the notation here.

Does $X = \{X_n\}_{n \in \mathbf{N}}$ mean $X = X_1, X_2, \ldots$ where each $X_i$ comes from the same distribution? So if our $X$ distribution is $0-1$ coin flips, then $X$ is an infinite sequences of $0$s and $1$s ?

A note says "typically, each $X_n$ ranges over strings of length $poly(n)$". What does this mean? The distribution will take $n$ as some sort of input? In my coin-flipping example each $X_i$ gives us back just one bit...

I also want to ask why the $1^n$ is necessary as input to the distinguisher $D$, but I should probably understand the answers to my above questions first.

Dramal
  • 61
  • 1

4 Answers4

3

$P(D(X_n, 1^n) = 1)$ means the probability that the following algorithm returns $1$:

  1. Sample $x$ according to $X_n$.
  2. Run $D(x, 1^n)$.
  3. Output whatever $D$ outputs.

Say for example that $X_n$ is the uniform distribution on $\{1,2,3,4\}$, and $D(x,1^n)$ is the following:

  1. If $x$ is odd, output $x$ or $x+1$, each with probability $1/2$.
  2. If $x$ is even, output $x$.

Then $P(D(X_n, 1^n)=1)$ is $1/8$ (we need to pick $x=1$, which happens with probability $1/4$, and then output $x$ with probability $1/2$).

I don't think your example where each $X_i$ is defined on $\{0,1\}$ will be very enlightening. Generally, $X_n$ is the output of some algorithm on input $n$, for example the key generation algorithm of a cryptosystem where $n$ is the desired key length. Then the length of the strings $X_i$ is defined over really is $\mathsf{poly}(i)$.

In this case where $X_i$ and $Y_i$ are the outputs of algorithms, the idea of the definition is that $D$ is unable to reliably tell whether the string it gets has been "produced" by $X$ or by $Y$.

fkraiem
  • 8,242
  • 2
  • 28
  • 38
2

No, $X = \{X_n\}_{n \in \mathbf{N}}$ means $X = X_1, X_2, \ldots$ where each $X_i$ is a distribution.
So one could let each $X_i$ be the uniform distribution on strings of length $i$.

That means there is a polynomial $q$ such that for all $n$ and $x$, if $X_n$
assigns non-zero probability to $x$ then the length of $x$ is at most $q(i\hspace{.02 in})$.
The distributions do "take $n$ as some sort of input".

See these two questions for $1^n$.

1

The $X_i$ could be distributed differently, but does not have to be. The idea of an ensemble is just that the distributions are related somehow. A way to think of this is that, $X_i$ could be the distribution of the output from some randomized algorithm given input $i$. Consider, for example, the algorithm that on input $i$ outputs a uniformly random bit-string of length $i$. We can now describe the outputs of this algorithm as the ensemble $X = \{X_i\}_{i \in \mathbb{N}}$, where $X_i$ follows the distribution of uniformly random strings.

Of course in general the algorithm could do all sorts of things. You could also think of $X_i$ as following the distribution of $i$-bit keys for some encryption scheme (generated by a particular key generation algorithm) or whatever you want.

Of course, algorithms do not have to give output of increasing length for increasing input $i$. And the same goes for the corresponding ensemble $X = \{X_i\}_{i \in \mathbb{N}}$, the $X_i$ do not have to follow a distribution of strings of increasing length.

So in your bit-flipping example you could think of $i$ as some input to your bit-flipping algorithm (or protocol), that makes it produce one bit of output. My guess is that your bit-flipping protocol takes some kind of security parameter. Typically the higher the security parameter the closer to uniformly random the output of the bit-flip will be. So you could describe the output of the protocol as the ensemble $X = \{X_i\}_{i \in \mathbb{N}}$, where $X_i$ is the output of the protocol when using security parameter $i$. Now, consider an other ensemble $U = \{U_i\}_{i \in \mathbb{N}}$, where all $U_i$ is a truly random bit. If $U$ and $X$ are computationally indistinguishable according to your definition, this then roughly means that if you use a high enough security parameter then no one will be able to distinguish the output from your bit-flipping algorithm apart from a truly random bit-flip.

Now the the second part of the question. This is kind of a technicality. $1^n$ is a way to write the number $n$ in unary notation. I.e., $1^n$ just means $n$ ones (so, e.g., $1^2 = 11$, $1^3 = 111$ and so on). Why would you do that? Because if you give an algorithm the input $n$ in binary then the input is only of size $log(n)$. Thus, if the algorithm runs in time $poly(n)$ then you have an exponential time algorithm*! However, in cryptography we often want to give an algorithm a security parameter $n$ and then have the algorithm run in time $poly(n)$, but we also want to call such an algorithm poly-time. So by using unary notation, we can cheat a little bit and get what we want. So your bit-flipping protocol would take the security parameter $i$ as $1^i$ and $X_i$ would be the distribution of output on input $1^i$.

*Because in computer science in general when we call something an exponential time algorithm, we mean exponential in the length of the input.

Guut Boy
  • 2,907
  • 18
  • 25
0

I see a few small nuanced errors in all the above explanations and they are easy to make for any of us.

Let me clarify this in as straight forward but detailed manner as I can:

$I$ is a countable set and the probability ensemble indexed by $I$ is a collection of random variables denoted as: ${X_i}\in I$. I can either take the form of the natural numbers or an efficiently computable subset (Katz & Lindell, 2008). $X_n$ is the distribution and $X_i$ is the collection of elements. When we see $I = N$ the ensemble is a sequence of random variables $X_1, X_2, X_3,\ldots$

Next all we are doing is taking the differences between the $X_n$ distribution and the running of said distribution.

Then we see pseudo-randomness is just a special case of computational indistinguishability. All this means the difference epsilon is very small or negligible or trivial if you will; not exactly the same.

Further reading:

https://wiki.cc.gatech.edu/theory/images/b/b2/Lec5.pdf

Jonathan Katz & Yehuda Lindell (2008) Introduction to Modern Cryptography.

cpast
  • 3,652
  • 1
  • 16
  • 28
Jacob E Mack
  • 173
  • 6