Find optimal play by optimizing orders of each player alternatingly

Question

A zero-sum game for two players allows a player to take no action during a turn. Can I reach optimal play (where both players always choose the best possible action in each turn) by the following scheme?

Randomly choose move orders $A_{1}$: $O_{A1}, O_{A2}, ...$ for player A over turns $1,2,...$
Find player B's optimal orders $B_{1}$: $O_{B1}, O_{B2}, ...$ for countering $A_{1}$ (with an integer linear programming solver). If $B_{1}$ makes some orders $O_{Ai}$ in $A_{1}$ invalid, pretend that player A takes no action during turn $i$. For example, capturing a piece can make $O_{A3}$ an invalid order. In this case, pretend $O_{A3}$ does not exist.
Same as step 2 except we find player A's optimal orders $A_{2}$ that counters $B_{1}$.
Repeat steps 2 and 3 until the orders of both sides don't change.

Optimal play, which is a Nash equilibrium, exists. The scheme looks like coordinated descent except for the bit that ignores invalid moves.

I ask this question because I am trying to use an integer linear programming solver instead of implementing minimax for finding optimal play.

score 0 · Accepted Answer · answered Jun 11 '21 at 20:34

No. If the game is the prisoner's dilemma, depending on the random choices, this could converge to "everyone defects", which is not (globally) optimal; the (globally) optimal solution is "everyone cooperates". Also, it is possible that it may fail to converge at all and loop forever. Even if it does converge, it may only converge to a local optimum, not a global optimum.

Find optimal play by optimizing orders of each player alternatingly

1 Answers1