Let's say I'd like to design a Q learning algorithm that learns to play poker. The number of different possible States is very large, but a lot are very similar: for example, if the initial state having 10 spades, 4 hearts, 6 clubs on the table and holding King and Queen of hearts had already been visited, I would like it to affect the weights of similar states, like the same cards with different suits. How do I accomplish this?
2 Answers
I like your use of the word "like". It means "having the same characteristics or qualities as; similar to." It means that in some ways it is the same, but implies that in some ways it is different. For this problem I am going to hear your like as if you were saying "similar in a general sense, but dissimilar in significant enough ways to drive my current approach".
One paraphrase to your main question: how do I connect similar or effectively identical states in the state-space so that the training rate and training quality are maximized without having to rely on a-priori knowledge such as combinatorics.
If I had to do this, I would use a graph-network to represent the transition paths, find connection groups that had similar statistics, and preferentially explore then as a paired-test. If the weighted connections in the similar subgraphs align within a tolerance band, then we could call them an approximate isomorphism, then set something like a file-link, so that any attempt to perform q-learning in the isomorphous domain is only operated on the non-copy. As long as "close enough" is well specified, this could truncate the search space substantially.
(still working) To do:
- set up a poker-analog and use the graph-centric approach to handle isomorphous regions
- compare with classic Q-learning
- compare with combinatoric (expert) speedup
- perhaps find a "like" question such that combinatoric method is not viable, and apply graph-based search-space reduction.
- 173
- 6