21

No overlaps. We are counting runs of at least 5, whereby for example a run of 6 does not count as 2 runs of 5.

I have received an answer to this question from someone with a PhD in Statistics, yet their theoretical answer does not agree with my code simulation.

Theoretically, the answer would be

$ \frac{96}{32} - \frac{95}{64} + \frac{94}{128} - \frac{93}{256} + \frac{92}{512} - ...$

since we expect $\frac{96}{32}=3$ instances of 5 heads in a row (if we allow overlap), and if we apply the Inclusion-Exclusion Principle, we can correct for double counts of runs of 6, 7, 8, etc...

The problem is, this answer is approx 2, but my coded simulation always results in about 1.5:

import numpy as np
import pandas as pd

def random_list(length): random_list = np.zeros(length) #list of "length" zeros for i in range(len(random_list)): random_value = np.random.random() #instantiate random value for each i if random_value > 0.5: #for approximately half of the random values random_list[i] = 1 else: random_list[i] = 0

return random_list #random_list is now a random list of zeros and ones

def count_ones(array): runs = 0 i = 0 while i < (len(array) - 1): #iterate over each index in list if array[i] == 1: #find a value of one j = i + 1 #the next value is j while array[j] == 1: #iterate over indices until we hit a zero or end of array if j == (len(array)-1): break #break out of loop if we are at the end of the list j += 1 k = j #we now have either the first zero after a list of ones, or we are at the end of the list ones = k - i #how many ones in a row if ones >= 5: runs += 1 #count this as a run of 5 else: #if array[i] == 0 k = i # necessary so that code after if/else conditional runs i = k + 1 # loop will iterate over index after k

return runs

def average_runs(trials): results = np.zeros(trials) for i in range(trials): #do it a large number of times array = random_list(100) runs = count_ones(array) results[i] = runs average = sum(results)/len(results) #take the average

return average

average_runs(1000)

Can anyone explain why the simulation and theory do not agree?

Stephen Taul
  • 337
  • 2
  • 6
  • Try that for real, either with an actual coin or a computer simulation and you should find that even runs of 10, let alone five, are common.

    I took my results to a Cambridge PhD in number theory, who said 'Yes… that's what most of my students find.'

    My own trial started not with mere binary coin tosses, but roulette spins… though in either case, runs of 12 or 13 in a row were not uncommon.

    If your particular simulation doesn't agree with whichever theory you're following, what does that suggest?

    Almost separately, why did you need such complex code for such a simple problem?

    – Robbie Goodwin May 28 '25 at 20:56
  • You don't need numpy, pandas, or any if/while logic to test this with python, by the way. Your code works great, but I had fun trying to write something hopefully simpler: https://colab.research.google.com/drive/19e5IeaPmDonNir5Em7m8mO65PRXbKZme?usp=sharing – TylerW May 29 '25 at 00:29
  • You might enjoy reading about Shannon's communications theory. – Carl Witthoft May 29 '25 at 15:27
  • 2
    3k views in 2 days even without bounty... that's crazy. btw, I'm kinda disappointed not seeing a generating-function answer :) – Quý Nhân May 30 '25 at 13:31
  • There's a bug in your code: count_ones won't count a run of exactly $5$ heads right at the end. Therefore the number your code is estimating is actually $1.5$ :). Unsolicited advice.. if you find yourself writing a for loop that iterates over a numpy array, either there is a better way to do what you want to achieve or you shouldn't be using numpy. If you're interested in some faster/more compact implementations, I compared a few to your approach here. (@TylerW you may be interested too) – Izaak van Dongen Jun 04 '25 at 15:57
  • 1
    I’m voting to close this question because Math.SE is not a coding site. – amWhy Jun 21 '25 at 20:22

3 Answers3

33

There is no triple count here, because you cannot have $A_i$ and $A_{i+2}$ without $A_{i+1}$ (where $A_i$ means 5 consecutive heads starting in round $i$). So the answer is the expected number of runs of 5 consecutive (possibly with overlaps) minus the expected number of overlaps (which is exactly the expected number of runs of 6 consecutive): $$\frac{96}{32}-\frac{95}{64}\approx 1.52$$

22

The infinite sum seems like overkill to me.

A toss "starts a run" of five or more consecutive heads if it and the next four coins are heads, and the previous coin, if any, was tails (otherwise it is just part of a longer run).

So the very first toss has a $\frac{1}{2^5} = \frac{1}{32}$ chance of starting a run, the last four cannot start a run at all, and the remaining 95 have a $\frac{1}{2^6} = \frac{1}{64}$ chance of starting a run. Any run must start in exactly one place, so this covers all the possibilities. Total: 97/64 = 1.515625.

Toph
  • 1,566
  • 7
  • 17
  • 1
    If the sequence contains 10 (or more) consecutive heads, then both the first and the sixth coin start a run of five but your method wouldn't count it for the sixth coin. This does make a difference but nowhere near enough to get to around 2 instead. – quarague May 29 '25 at 12:30
  • 4
    See the nice thing about the Original Asker including a computer program is that it perfectly encapsulates what OA meant by his definitions and on those definitions, the first and sixth will not each start his, "a run of at least 5," but it is regarded as one run of 10. – CR Drost May 29 '25 at 16:30
13

Here is another way to see that their "theoretical answer" is wrong (and Christophe Boilley's is correct, although verifying that our two expressions are exactly equal is not so easy).

What is the expected number of runs of exactly $k$? There are $101-k$ places this can start. In all but two of these, the probability of having a run of exactly $k$ starting there is $(\frac12)^{k+2}$, since we need $k$ heads with a tail either side. The two exceptions are when the run starts at the first coin or ends at the 100th coin, and these have probability $(\frac12)^{k+1}$. Thus the overall expected number of runs of exactly $k$ is $(103-k)\times(\frac12)^{k+2}$, for any $k\leq 99$. (For $k=100$ the probability is $(\frac12)^{100}$; what goes wrong with the above is that the "two exceptions" coincide.

Thus the total expectation of the number of runs of at least $5$ is $$\frac{98}{2^7}+\frac{97}{2^8}+\cdots+\frac{5}{2^{100}}+\frac{4}{2^{101}}+\frac{1}{2^{100}}\approx 1.516.$$

  • 3
    To be sure, the series does sum to $97/64$ (see https://www.wolframalpha.com/input?i=%2898%2F2%5E7+%2B+97%2F2%5E8+%2B+...+%2B+4%2F2%5E%28101%29%29+%2B+%281%2F2%5E%28100%29%29). To do this "by hand", see e.g. https://math.stackexchange.com/questions/894998/explanation-of-the-formulas-for-sums-sum-nrn-and-sum-n2-rn – ronno May 28 '25 at 18:21
  • @Especially Lime Is the expected number of runs of exactly $k$ not equal to $\frac{101-k}{2^k}$? – Abhay Agarwal Jun 09 '25 at 16:28
  • @AbhayAgarwal no, because you need to take into account not only the probability of a given set of $k$ coins being heads, but also that the coin(s) either side are tails. – Especially Lime Jun 10 '25 at 07:50