1

Can Anybody tell me why the expectation value of the Index of Coincidence is the following?

For a cipher text string of length $L$, where $n$ is the number of alphabet characters, $k_r = 1/n$, and $k_p = \sum_{i=1}^n p_i^2$

$$ E(IC) = \frac{1}{t}*\frac{L - t}{L - 1}*k_p + \frac{t - 1}{t}*\frac{L}{L - 1}*k_r$$


I forgot to state that $t$ is the period of the poly alphabetic cipher used to encipher the message.

Matteo
  • 1,171
  • 2
  • 12
  • 23

1 Answers1

2

Expected Index of Coincidence usually refers to a language's expected index of coincidence (1.73 for English, or 0.067 if you're not normalising).

The formula in question is usually used to determine the length of the key ($t$) given the (measured) $IC$ of received cipher-text. $IC$ is the probability that two randomly-selected letters from the cipher text are identical.

Let $X=\{ x_1,x_2,x_3,...x_L\}$ be the cipher-text. If we think the the poly-alphabetic cipher has period $t$, then we would expect each of the following:

\begin{eqnarray} & X_1 =\{x_1, x_{t+1},x_{2t+1},...\}\\ & X_2 =\{x_2, x_{t+2},x_{2t+2},...\}\\ & \vdots \\ &X_t =\{x_t, x_{2t},x_{3t},...\} \end{eqnarray}

to exhibit the same index as the plaintext ($\kappa_p$). So we can reconstruct $IC(X)$ as follows. Pick two letters at random, we want the probability that they match.

The probability that they are in the same $X_i$ is:

$$ \frac{tC(\frac{L}{t},2)}{C(L,2)} = \frac{L*(\frac{L}{t}-1)}{L(L-1)} $$

The probability they are in different $X_i$s is: $$ \frac{C(t,2)*\frac{L}{t}*\frac{L}{t}}{C(L,2)} = \frac{t(t-1)*\frac{L}{t}*\frac{L}{t}}{L(L-1)} $$

  • If they are in the same $X_i$ then they are both enciphered using the same alphabet: so the probability is $\kappa_p$
  • If they are in different $X_i$s then they are enciphered using different alphabets, so we can assume they are randomly distributed: so the probability is $\kappa_r$

So the probability of two random letters matching is approximately

\begin{eqnarray} IC(X) & \approx \frac{L*(\frac{L}{t}-1)}{L(L-1)}\kappa_p + \frac{t(t-1)*\frac{L}{t}*\frac{L}{t}}{L(L-1)}\kappa_r\\ & = \frac{(L-t)}{t(L-1)}\kappa_p + \frac{(t-1)*L}{t(L-1)}\kappa_r \end{eqnarray}

Stephen Harris
  • 391
  • 1
  • 7