0

Say we were to uniformly sample $k$ times from a bit string with length $n$. What is the expected number of pairs with a Hamming distance $d$? In the limit of Hamming distance 0, I realize this becomes a birthday collision problem, but I am interested in the general case.

For example, say we generate 1000 random 10-bit strings, what is the expected number of strings with a Hamming distance of 3?

I found several related posts: Probability that a set of 'N' random binary strings are all at least a certain Hamming distance 'k' apart , birthday problem near misses , and Average number of strings with edit distance exactly 2 but I could not quite make the additional leap to answer my similar but not exactly the same question.

Thanks in advance for your help!

  • 1
    The question, as you have it worded right now, does not make sense to me. Maybe you meant to ask, "What is the expected number of pairs of sampled strings whose distance is $d$?". Is that a fair interpretation? – Mike Earnest May 12 '22 at 16:38
  • Hi, yes that is what I mean. Thanks for clarifying it – Entangler May 13 '22 at 07:45

2 Answers2

2

Actually this seems quite easy

Let $A^d_{i,j}$ be the event that two strings $s_i$ $s_j$ have distance $d$. Then, if both $s_i,s_j$ are random, we have $$P(A^d_{i,j})= \binom{n}{d} 2^{-n}$$

Then, if we sample $k$ random strings, the expected number of pairs with such distance is

$$ \frac{k(k-1)}{2} \binom{n}{d} 2^{-n}$$

leonbloy
  • 66,202
0

Just adding extra clarity for myself here based on leonbloy's answer:

The probability that a pair of strings has Hamming distance $d$ follows a binomial distribution: $$ P(X=d)=\binom{n}{d}(1/2)^d (1/2)^{n-d} = \frac{1}{2^n}\binom{n}{d} $$

as explained by Probability that hamming distance is d.

Then, this probability is multiplied by the total number of pairings that can be made in a set of $k$ items. This can be found from the series: $(k-1) + (k-2) + ... (k-(k-1))$ which reduces to $\frac{k(k-1)}{2}$