Representing similar states in reinforcement learning?

Question

Let's say I'd like to design a Q learning algorithm that learns to play poker. The number of different possible States is very large, but a lot are very similar: for example, if the initial state having 10 spades, 4 hearts, 6 clubs on the table and holding King and Queen of hearts had already been visited, I would like it to affect the weights of similar states, like the same cards with different suits. How do I accomplish this?

score 1 · Answer 1 · edited Jun 04 '19 at 01:49

1

Define your biggest suits as suit1 and if lowest is different suit2. Then do the same with the ground.

In your example, it would be king and queen of suit1 in your hand and 4 suit1, and 10 suit2 and 6 suit3 on the ground.

edited Jun 04 '19 at 01:49

Ethan

1,657
9
25
39

answered Aug 06 '18 at 09:34

parvij

791
5
18

score 1 · Answer 2 · answered Nov 30 '19 at 21:23

I like your use of the word "like". It means "having the same characteristics or qualities as; similar to." It means that in some ways it is the same, but implies that in some ways it is different. For this problem I am going to hear your like as if you were saying "similar in a general sense, but dissimilar in significant enough ways to drive my current approach".

One paraphrase to your main question: how do I connect similar or effectively identical states in the state-space so that the training rate and training quality are maximized without having to rely on a-priori knowledge such as combinatorics.

If I had to do this, I would use a graph-network to represent the transition paths, find connection groups that had similar statistics, and preferentially explore then as a paired-test. If the weighted connections in the similar subgraphs align within a tolerance band, then we could call them an approximate isomorphism, then set something like a file-link, so that any attempt to perform q-learning in the isomorphous domain is only operated on the non-copy. As long as "close enough" is well specified, this could truncate the search space substantially.

(still working) To do:

set up a poker-analog and use the graph-centric approach to handle isomorphous regions
compare with classic Q-learning
compare with combinatoric (expert) speedup
perhaps find a "like" question such that combinatoric method is not viable, and apply graph-based search-space reduction.

Representing similar states in reinforcement learning?

2 Answers2