53

You play a game flipping a fair coin. You may stop after any trial, at which point you are paid in dollars the percentage of heads flipped. So if on the first trial you flip a head, you should stop and earn \$100 because you have 100% heads. If you flip a tail then a head, you could either stop and earn \$50, or continue on, hoping the ratio will exceed 1/2. This second strategy is superior.

A paper by Medina and Zeilberger (arXiv:0907.0032v2 [math.PR]) says that it is an unsolved problem to determine if it is better to continue or stop after you have flipped 5 heads in 8 trials: accept \$62.50 or hope for more. It is easy to simulate this problem and it is clear from even limited experimental data that it is better to continue (perhaps more than 70% chance you'll improve over \$62.50):
alt text
My question is basically: Why is this difficult to prove? Presumably it is not that difficult to write out an expression for the expectation of exceeding 5/8 in terms of the cumulative binomial distribution.


(5 Dec 2013). A paper on this topic was just published:

Olle Häggström, Johan Wästlund. "Rigorous computer analysis of the Chow-Robbins game." (pre-journal arXiv link). The American Mathematical Monthly, Vol. 120, No. 10, December 2013. (Jstor link). From the Abstract:

"In particular, we confirm that with 5 heads and 3 tails, stopping is optimal."

Daniel Fischer
  • 211,575
  • Would you go on with a game when winning in 70% of the cases a little money and loosing a lot in the remaining 30% of the cases? – j.p. Oct 06 '10 at 12:56
  • 7
    There are plenty of problems which are easy to simulate and to guess an answer to where rigorously proving that answer is wide open, e.g. things involving primes, self-avoiding random walks... – Qiaochu Yuan Oct 06 '10 at 13:00
  • @jug: Yes, that does have to be factored into any decision. But it certainly can be quantified precisely. – Joseph O'Rourke Oct 06 '10 at 13:05
  • @Qiaochu: Point well taken! But this one feels different to me, in that one presumably can write out (although I haven't) explicit expressions. – Joseph O'Rourke Oct 06 '10 at 13:08
  • 6
    @Joseph O'Rourke: have you tried actually doing that? I think it will end up being hard. – Qiaochu Yuan Oct 06 '10 at 13:18
  • @Qiaochu: Perhaps that is the only way to answer my question: Write out the expectation explicitly. When time permits I'll give it a try. Or if anyone else is facile in this domain, you are welcome to lend a hand... – Joseph O'Rourke Oct 06 '10 at 13:30
  • 1
    @Joseph: Thanks to the law of the iterated logarithm you are guaranteed $50. But what will your strategy be for continuing or stopping after having drawn $k$ after $n$ throws? The expectation depends on your strategy, which has to be a function of $k$ and $n$. Your graph hides the fact that you have to stop without knowing the future. – j.p. Oct 06 '10 at 13:42
  • @jug: One possible strategy would be: Stop when the current ratio exceeds the expected future maximum ratio. You are right that there is more than one strategy! – Joseph O'Rourke Oct 06 '10 at 13:54
  • 1
    @Joseph: What do you mean with the expected future maximum ration? – j.p. Oct 06 '10 at 14:21
  • From my point you want to predict the peak for the next X coin flips given your current score and stop when you hit it. Finding the optimum balance between value of the peak and probablity of exceeding or hitting it seems challenging. – Raynos Oct 06 '10 at 15:53
  • 1
    What if you know in advance the coin will come up heads p% of the time, even for p!=50? That's sort of the question I asked at: http://quant.stackexchange.com/questions/2172/probability-distribution-of-maximum-value-of-binary-option (and no answers there yet either) –  Feb 01 '14 at 15:56

2 Answers2

23

I accept Qiaochu's answer "Have you tried actually doing that?" I did try now, and now I can appreciate the challenge. :-) The paper I cited refers to another by Chow and Robbins from 1965 that has a beautiful formulation, much clearer than the cummulative binomials with which I struggled. Let me explain it, because it is really cool.

For the natural strategy I mentioned in the comments (and echoed by Raynos), let $f(n,h,t)$ be the expected payoff if you start with $h$ heads and $t$ tails, and let the game continue no more than $n$ further trials. Then there is an easy recursive formulation for $f$: $$f(n,h,t) = \max \left( \frac{1}{2} f(n,h+1,t) + \frac{1}{2} f(n,h,t+1) , h/(h+t) \right) $$ because you have an equal chance of increasing to $h+1$ heads or to $t+1$ tails on the next flip if you continue, and you get the current ratio if you stop. Now, when $h+t=n$, you need to make a "boundary" assumption. Assuming the law of large numbers for large $n$ leads to the reasonable value $\max ( 1/2, h/n )$ in this case. So now all you need to do is fill out the table for all $h$ and $t$ values up to $n=h+t$. The real answer is the limit when $n \rightarrow \infty$, but using large $n$ approximates this limit.

After the Medina and Zeilberger paper was released, in fact just about three weeks ago, a very careful calculation using the above recursive formulation was made by Julian Wiseman and reported on this web page. The conclusion is to me remarkable: "Choosing to continue lowers the expected payoff [from 0.625] to 0.62361957757." This is still not a proof, but the "answer" is now known. So my "it is clear from even limited experimental data that" was completely wrong! I am delighted to learn from my mistake.

  • on the rhs of your recursion relation for $f(n,h,t)$, shouldn't the $n$ be $n-1$? – ronaf Oct 07 '10 at 14:13
  • Ah, I see that is confusing. The answer is 'No' but I see that perhaps I mis-stated the prose meaning of $f$. There are still a total of $n$ trials even after you flip and flip. I want $n$ to be fixed so that $h+t$ eventually reaches $n$. – Joseph O'Rourke Oct 07 '10 at 14:41
5

This seems to be related to Gittins Indices. Gittins Indices are a way of solving these kind of optimal stopping problems for some classes of problems, and basically give you a way of balancing how much you are expected to gain given your current knowledge and how much more you could gain by risking obtaining more information about the process (or probability of flipping heads, etc).

Bruno

Bruno
  • 496