7

I know that there are some related questions, but they seem to be overkill for this small exercise.

I have 10 (fair) coin tosses and am interested in the probability that I have at least 4 consecutive heads.

So I had a lot of different ideas, but I think many of them do not work too well.

1) Markov chain: But then, somehow, we need to keep track of the number of coins we have already tossed, so one "loses" the Markov property.

2) count all possibilities: we have $2^{10}$ possibilities in total and can subtract the ones that have at most 3 consecutive heads. But seems to be nasty.

3) recursive equation? Let $p_{i,j}$ for $i \leq j$ be the probability that we have at least $j$ consecutive heads. But also this seems to be not that easy..

So this question is asked in a book in the first chapter, so it shouldn't be too hard?

David
  • 1,710
user136457
  • 2,610
  • 1
  • 23
  • 45

4 Answers4

7

Let $S_N$ be the set of the strings over the alphabet $\Sigma=\{0,1\}$ with length $N$, avoiding $4$ consecutive $1$'s, and $T_N=|S_N|$. The only possible prefixes of an element of $S_N$ can be: $$0,\quad 10,\quad 110, \quad 1110$$ hence we have: $$ T_N = T_{N-1}+T_{N-2}+T_{N-3}+T_{N-4} $$ and: $$ T_1=2,\quad T_2=4,\quad T_3=8,\quad T_4=15$$ leading to: $$ T_{10}=773.$$ The probability that at least for consecutive heads appear is so: $$ 1-\frac{773}{2^{10}} = \frac{251}{1024}.$$

Jack D'Aurizio
  • 361,689
  • thank you for your answer. But I do not really get why prefixes only can be of these forms. It might be because I do not really get what you mean with prefixes. So for example $1011 \in S_4$, so you would say that it has prefix $10$? – user136457 Sep 27 '14 at 16:25
  • @user136457: exactly, yes. – Jack D'Aurizio Sep 27 '14 at 16:36
  • So I think now I got it. Thank you. But does this somehow generalize to other settings? For example if I want to calculate the expected value of number of times the sequence 12345 occurs if I draw 100'000 numbers uniformly at random from 1,...,9? So this seems to be a very similar question, but does something similar apply here? – user136457 Sep 27 '14 at 17:27
  • Can someone please explain why $T_N = T_{N-1}+T_{N-2}+T_{N-3}+T_{N-4}$? For example, why is $T_5 = T_4 + T_3 + T_2 + T_1$ which is $29$? Actually I don't understand the entire solution. – David Sep 27 '14 at 22:24
  • @David: If the prefix is $0$, it is followed by an element of $S_{N-1}$; if the prefix is $10$, it is followed by an element of $S_{N-2}$ and so on. – Jack D'Aurizio Sep 27 '14 at 23:02
  • @Jack D'Aurizio: I still don't quite understand about these prefixes. I see you are avoiding 4 ones in a row but I dont see why the # of valid combinations of len n is the sum of the # of valid combinations of the 4 previous shorter length strings. Can you give me a detailed explanation please cuz I really want to understand this concept. – David Sep 27 '14 at 23:47
  • @JackD'Aurizio Have you left out the case 1111? – user103828 Sep 28 '14 at 16:08
  • @user103828: no, since I was counting the strings that do not have four consecutive ones. 1111 is not an allowed prefix for an element of $S_N$. – Jack D'Aurizio Sep 28 '14 at 16:40
  • I still don't understand this prefix thing. I understand that for a string of length 4, there are 15 possible strings of $0$s and $1$s not having $4$ $1$s in a row. After that I am "lost". Please elaborate. – David Sep 29 '14 at 18:56
2

Another answer is you could take the formula for getting $5$+ consecutive out of $10$ random draws and then just add in the case $4$s ($139$ of those).

The formula for $5$+ consecutive out of 10 random draws is $p^5(6-5p)$ which for p=$0.5$ gives us $7/64$ which is $112/1024$.

Another answer is you could take an "almost correct" formula for getting $4$+ consecutive out of $10$ random draws but you'd have to subtract out the $5$ occurrences where they are either actually case $5$s or "collide" like in my other enumerated answer to this same question.

The "almost" formula for $4$+ consecutive out of 10 random draws is $p^4(7-6p)$ which for p=$0.5$ gives us $4/16$ which is $256/1024$ then subtract out the $5$ we double counted with this formula (see my enumerated answer elsewhere) and we get the correct $251$ (out of $1024$).

David
  • 1,710
  • how do you get the formula for 5+ consecutive out of 10 (and 4+ consecutive out of 10)? – user103828 Sep 30 '14 at 19:06
  • @user103828: I got a bunch of terms for each of the cases such as P(exactly 5), P(exactly 6)... but when you combine them all together into something like P(5+), a lot of the intermediate terms cancel out and leave a nice short formula. For example, for P(exactly 5) I have $6p^5 - 10p^6 + 4p^7$. P(exactly 6) is $5p^6 - 8p^7 + 3p^8$. P(exactly 7) is $4p^7 - 6p^8 + 2p^9$. P(exactly 8) is $3p^8 - 4p^9 + p^{10}$. P(exactly 9) is $2p^9 - 2p^{10}$. P(exactly 10) is $p^{10}$. You can see many terms cancel out for P(5+). Line up similar terms on paper and you can easily see that. – David Oct 01 '14 at 21:45
  • @user103828: I misplaced some of my notes that I could use to explain to you how I got these terms but I will keep looking for them and post another comment if I find them, explaining the formulas in more detail. I like my placeholder example cuz you can see exactly what is happening in the cases and how many there are of each. Other solutions tend to me more abstract and a number just "pops out" without the person really understanding why. Getting the right answer is important but understanding why is important to. – David Oct 01 '14 at 21:51
1

Just count them up using placeholders. Notice I start with the heads in the leftmost slots and then gradually work them to the right one placeholder at a time (case $8$ is a simple example to illustrate this).

Let H = Head, T = Tail, - = don't care (could be Head or Tail), (nH) = n consecutive Heads.

$10$ in a row max: ($10$H)
(only $1$ occurrence possible)

$9$ in a row max: ($9$H)T or T($9$H)
($2$ occurrences)

$8$ in a row max: ($8$H)T- or T($8$H)T or -T($8$H)
($5$ occurrences)

$7$ in a row max: ($7$H)T-- or T($7$H)T- or -T($7$H)T or --T($7$H)
($12$ occurrences)

$6$ in a row max: ($6$H)T--- or T($6$H)T-- or -T($6$H)T- or --T($6$H)T or ---T($6$H)
($28$ occurrences)

$5$ in a row max: ($5$H)T---- or T($5$H)T--- or -T($5$H)T-- or --T($5$H)T- or ---T($5$H)T or ----T($5$H)
($64$ occurrences)

$4$ in a row max: ($4$H)T----- or T($4$H)T---- or -T($4$H)T--- or --T($4$H)T-- or ---T($4$H)T- or ----T($4$H)T or -----T($4$H)
($144$ occurrences)

Case $4$ is special so it needs extra care. The following patterns match more than once so we have to make sure we only count that trial once as a winner.

($4$H)T----- and -----T($4$H) are case $5$s, not case $4$s, if all - are heads, so subtract these $2$ cases from $144$ to get $142$.

Next we have to make sure we don't double count cases where a string of $4$ consecutive heads can appear twice in the string of length 10. There are $3$ main patterns for that, namely: ($4$H)T($4$H)T, ($4$H)TT($4$H), and T($4$H)T($4$H).

($4$H)T----- collides with ----T($4$H)T if we have ($4$H)T($4$H)T so subtract $1$ from $142$ to get $141$.
($4$H)T----- collides with -----T($4$H) if we have ($4$H)TT($4$H) so subtract $1$ from $141$ to get $140$.
T($4$H)T---- collides with -----T($4$H) if we have T($4$H)T($4$H) so subtract $1$ from $140$ to get $139$.

So we have $144 - 2 - 1 - 1 - 1 = 139$ "corrected" occurrences of case $4$.

Total number of good outcomes is $1 + 2 + 5 + 12 + 28 + 64 + 139 = 251$.

$251 / 1024$ is about $24.5$%.

A slight advantage of this method is you can get a visual of what is happening and you can see how many of each case there are so for example, if you wanted to know $5$+ heads max in a row, just add up cases $5$ thru $10$ which would be $112$ total. A disadvantage is it is more work and is only practical for small numbers of flips such as $10$. If you were looking for $20$+ heads out of $100$ coin flips instead, then don't use this tedious method. Also, if you had asked for $3$+ heads, there would be more special cases to handle so this is not the best method for that situation.

David
  • 1,710
1

Let $P_N$ be the probability of at least $4$ consecutive heads in $N$ tosses and $p$ be the probability of a head. Then conditioning on the first toss, \begin{align*} P_4 &=p^4 \\ P_5 &=(1-p)P_4+p^4= (2-p)p^4\\ P_6 &=(1-p)P_5+p(1-p)P_4+p^4=(3-2p)p^4 \\ P_7 &=(1-p)P_6+p(1-p)P_5+p^2(1-p)P_4+p^4=(4-3p)p^4 \end{align*} and for $N\geq 8$ $$ P_N = (1-p)P_{N-1}+p(1-p)P_{N-2}+p^2(1-p)P_{N-3}+p^3(1-p)P_{N-4}+p^4 $$ so $$ P_8=(5-4p)p^4 \qquad P_9=(1-p)(5-p^4)p^4+p^4 \\ \qquad P_{10}=(1-p)(6+p^5-3p^4)p^4+p^4 $$ so when $p=1/2$, $$ P_4=0.0625 \qquad P_5=0.09375 \qquad P_6=0.125 \qquad P_7=0.15625 \qquad P_8=0.1875 \qquad P_9=0.216797 \qquad P_{10}=0.245117 $$

Edit: As @Byron Schmuland pointed out in the comments, $p^4$ was missing from $P_6$ and $P_7$.

user103828
  • 2,538
  • 1
    There should be a $+p^4$ at the end of your formulas for $P_6$ and $P_7$, just like at the end of your formula for $P_5$. –  Sep 28 '14 at 16:23