1

I am trying to answer the question - You toss a fair coin $n$ times. What is the expected number of strings of $h$ consecutive heads assuming overlapping instances are not allowed?

In the standard version of this problem, it is assumed that overlapping occurrences are allowed. In this case, we note that there are $n - h + 1$ spots of size $h$ in the huge string of length $n$ containing all the tosses. These spots constitute the locations in which the $h$ consecutive heads can occur. Therefore, by the linearity of expectations, the answer is $\frac{n - h + 1}{2^{h}}$. However, I don't think this can be applied here.

My Approach

I noticed that there was the potential for a recurrence relation to be formulated here using a first-step analysis of sorts. I first define $E(\alpha) =$ expected # of strings of $h$ heads in $\alpha$ tosses. Then we want to find $E(n)$.

  • We could get $h$ heads in the first $h$ tosses, in which case the expected value is $1 + E(n - h)$. This occurs with probability $\frac{1}{2^h}$
  • We could get a tail on the first toss, in which case the expected value is $E(n-1)$. This occurs with probability $\frac{1}{2}$
  • We could get a heads and then a tails on the first two tosses, in which case the expected value is $E(n-2)$. This occurs with probability $\frac{1}{4}$
  • We could get two heads and then a tails on the first three tosses, in which case the expected value is $E(n - 3)$. This occurs with probability $\frac{1}{8}$

This continues until the final case -

  • We could get $h-1$ heads and then a tails on the first $h$ throws, in which case the expected value is $E(n-h)$. This occurs with probability $\frac{1}{2^h}$

We can summarize all of these possibilities in a recurrence relation -

$E(n) = \frac{1}{2^h}\left(1 + E(n-h)\right) + \sum_{i=1}^{h}\frac{1}{2^{i-1}}E(n-i)$

This is where I stopped. I do see that the recurrence relation can be converted into a homogeneous one by considering $E(n - 1) = \frac{1}{2^h}\left(1 + E(n-1-h)\right) + \sum_{i=1}^{h}\frac{1}{2^{i-1}}E(n-1-i)$ and subtracting this from the original relation. This gives us

$E(n) - E(n-1) = \frac{1}{2^h}\left(E(n-h) - E(n-h-1)\right) + \sum_{i=1}^{h}\frac{1}{2^{i-1}}\left(E(n-i)-E(n-i-1)\right)$.

This is far too complicated for me to do anything about now.

Did I make this problem too complicated by approaching it this way? Is this approach even correct?

  • I think this approach makes sense, and that it is probably optimal. Computationally easy, for modest $n$... I suggest writing out the first hundred values and looking for provable patterns. If you can't get a closed formula, then at least you might get good bounds or asymptotics. – lulu Jun 09 '25 at 16:30
  • A run of exactly $m\cdot h$ consecutive heads would count as $m$ instances of $h$ consecutive heads, I expect. Then if $R(\ell)$ is the expected number of runs of at least $\ell$ successive heads in $n$ throws, the answer would be $R(h) + R(2h) + R(3h) + \dotsc$. No idea how complicated that computation would be. – Dermot Craddock Jun 09 '25 at 16:32

1 Answers1

2

EDIT: Let say that a head phase is a subsequence of consecutive heads that is not preceded nor followed by another head.


As pointed out by Dermot Craddock, the expected number of subsequences of $h$ consecutive heads is $\sum_{k=1}^{N}R(kh)$ where $R(\ell)$ is the expected number of head phases whose length is at least $\ell$, and $N=\lfloor n/h\rfloor$ is the maximal number of subsequences of $h$ consecutive heads.
Indeed, for a given sequence of coin tosses, when you sum up the number of head phases of length at least $kh$ for $1\le k\le\frac nh$, you count exactly $k$ times any head phase of length between $kh$ and $(k+1)h-1$.

But as you can see in that older post, this value $R(\ell)$ is simply the expected number of strings of $\ell$ consecutive heads minus the expected number of overlappings, the latter being the expected number of strings of $\ell+1$ consecutive heads.
Indeed, each head phase of length $L\ge\ell$ contains exactly $L-\ell+1$ subsequences of length $\ell$ and $L-\ell$ subsequences of length $\ell+1$, so the difference counts exactly once each of these head phases: $$R(\ell)=\frac{n-\ell+1}{2^\ell}-\frac{n-\ell}{2^{\ell+1}}=\frac{n-\ell+2}{2^{\ell+1}}$$ Theses fractions are of the form you give in the OP.

Then the expected value you ask for is $$\sum_{k=1}^N\frac{n-kh+2}{2^{kh+1}}=\frac{(n+2)(1-2^{-hN})}{2^{h+1}(1-2^{-h})}-\frac{h(1-2^{-hN}(N+1-2^{-h}N))}{2^{h+1}(1-2^{-h})^2}$$

  • Thank you for your answer! For starters, I would like to gain some intuition on what it means for a sequence of $n$ tosses to have at least $l$ consecutive heads. Let's take the sequence $HTTHH$ (I have taken $n=5$). It's clear to see that the number of strings of at least 2 consecutive heads is 1 in this case. But it isn't clear to me what the number of strings of at least 1 heads is in this case? – Abhay Agarwal Jun 10 '25 at 08:17
  • I completely understand how you evaluated the final sum by splitting it into $\sum_{k=1}^{N}\frac{n+2}{2^{kh+1}} - \sum_{k=1}^{N}\frac{kh}{2^{kh+1}}$. However, I am having some trouble understanding why
    1. the expected number of strings of $h$ consecutive heads is $\sum_{k=1}^{N}R(kh)$ with $N=\lfloor{\frac{n}{h}}\rfloor$ and

    2. $R(l)=\frac{n-l+1}{2^l}-\frac{n-l}{2^{l+1}}$

    – Abhay Agarwal Jun 10 '25 at 08:18
  • @AbhayAgarwal I tried to be more explicit in my answer. – Christophe Boilley Jun 10 '25 at 19:31
  • As a check, I notice that this does agree with the mean number of occurrences being asymptotically $n\over\mu$, where $\mu={1−p^h\over qp^h}$ (Feller, v.1, p.324) and here $p=q={1\over2}.$ – r.e.s. Jun 19 '25 at 18:41
  • Apologies for my late response here. I think I'm getting the hang of your solution @Christophe Boilley. The only question I have is - doesn't $R(h)$ already include the subsequences of length $2h, 3h$ and so on. So if we sum all the $R(kh)$, aren't we overcounting? – Abhay Agarwal Jun 25 '25 at 11:21