2

The premise is that you have a coin with unknown properties. You toss it and keep observing heads. How many trials do you need until you are 95% confident it is not a fair coin?

The null hypothesis is a fair coin. The mean is $\frac{1}{2}N$ and the variance is $\frac{1}{4}N$. The z score of observing N heads in N tosses is $\frac{N-\frac{1}{2}N}{\frac{1}{2}\sqrt{N}}$. Since I want a 95% interval, I solve for

$$\frac{N-\frac{1}{2}N}{\frac{1}{2}\sqrt{N}} > 1.96$$ which yields N > 3.84.

This seems awfully low. Let me know if I have done something wrong.

Jason
  • 139

2 Answers2

4

The short answer, from a frequentist perspective, is that the minimum number of flips that could possibly sway you against the coin being fair is obtained if all the flips are heads (or, equally, if all the flips are tails). The probability of this happening with a fair coin over $n$ flips is $2^{-n}$, and this first dips below $5$ percent at $n = 5$ flips. This is equivalent to a one-sided test for significance. If you require a two-sided test, then the probability of the flips being either all heads or all tails is $2^{-(n-1)}$, and this first drops below $5$ percent at $n = 6$ flips.


From a broader perspective, this question gets at the frequentist vs Bayesian debate. A Bayesian—to use the word attributively with a person—would say that the above analysis ignores any prior belief about the coin. If you look at the coin, and it has both a heads side and a tails side, and if you know you've flipped it $1000$ times before and it came down heads, say, $518$ times—consistent, under any reasonable assumption, with the coin being fair—you'll approach the event of it having flipped $5$ heads in a row quite differently than if you had never seen the coin before (including its other side).

To a Bayesian, the question is not addressed by seeing if we can falsify, at some level of statistical significance, the null hypothesis of the coin being fair. Instead, they would examine how the evidence of a sequence of flip outcomes adjusts any prior understanding about the coin's behavior.

They might, for example, start with a so-called uniform prior (meaning a presumption that the coin's probability of heads is uniformly distributed in the interval $[0, 1]$). With each flip, they update the distribution in accordance with Bayes's Theorem.

It turns out that these updates lead to the so-called Beta distributions. If you follow that link, you may notice that the Beta distribution—which includes the uniform distribution as a special case—is a conjugate prior for the Bernoulli distribution, which among other things is a good model for independent and identically distributed coin flips. This simply means that the math works out nicely: If you have a Beta distribution as the prior, then after a flip, the new a posteriori distribution is also Beta.

To be sure, there's no particular reason to favor the uniform prior, other than it plays nice. Other prior distributions might make more sense, depending on what you already know about the coin. That is in large part why the Bayesian perspective says the question can only be fully answered by stating the connection between what you believe of the coin before the experiment and what you believe of it afterward. If you are $100$ percent sure the coin was fair beforehand, then no length of experiment can every convince you otherwise (as lulu pointed out). You will only ever view it as progressively more strangely unlikely behavior of what you know with certainty to be a fair coin.


Incidentally, I would guess that you got $3.84$ because the count is too low for the normal approximation to be valid. The situation is small enough in scale that it's much more accurate to just rely on the binomial distribution.

Brian Tung
  • 35,584
2

It is actually quite straightforward to calculate the exact $p$-value of the outcome for any given number of throws, provided that all outcomes are heads. No approximation using the normal distribution is necessary.

Under the hypothesis $$H_0 : \pi = \pi_0 = 0.5, \quad \text{vs.} \quad H_a : \pi \ne 0.5,$$ where $$X \mid H_0 \sim \operatorname{Binomial}(n, \pi = 0.5)$$ counts the random number of heads obtained in $n$ throws, immediately gives the two-sided $p$-value

$$p = \Pr[(X = n) \cup (X = 0) \mid H_0] = \binom{n}{0} \pi_0^0 (1-\pi_0)^n + \binom{n}{n} \pi_0^n (1-\pi_0)^0 = 2 (1/2)^n = 2^{-n+1}.$$

Therefore, to reject $H_0$ with Type I error at most $\alpha$, we must have $p < \alpha$, or $$\color{red}{\boxed{n \ge \left\lceil 1 - \log_2 \alpha \right\rceil}} \tag{1}$$ For $\alpha = 0.05$, this yields $$n \ge 6.$$

Turning this around, we see that the probability of getting all heads in $n = 6$ fair coin tosses by random chance is simply $1/2^6 = 1/64 = 0.015625$, so the chance of getting such an extreme result (all heads or all tails) is twice this, or $0.03125$. As we showed above, this is the smallest sample size for which the two-sided test will have a Type I error not exceeding $0.05$, but we cannot "spend" the full Type I error because of the discrete nature of the test statistic: if $n = 5$, then we would have a $p$-value of $0.0625 > 0.05$.

That said, in the frequentist perspective, we have a direct and closed-form answer to your question: if you want to have no more than $\alpha$ probability of incorrectly concluding the coin is biased after seeing a run of all heads (or all tails), then you must toss the coin at least $n = \lceil 1 - \log_2 \alpha \rceil$ times.

heropup
  • 143,828