A model-free reinforcement learning technique.
Questions tagged [q-learning]
128 questions
52
votes
3 answers
What is "experience replay" and what are its benefits?
I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…
Ryan Zotti
- 4,209
- 3
- 21
- 33
12
votes
1 answer
Why could my DDQN get significantly worse after beating the game repeatedly?
I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…
Danny Tuppeny
- 223
- 2
- 7
12
votes
1 answer
Reinforcement learning: decreasing loss without increasing reward
I'm trying to solve OpenAI Gym's LunarLander-v2.
I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score.
Generally the loss decreases over many episodes but the reward doesn't improve much.
How…
Atuos
- 327
- 1
- 2
- 7
11
votes
2 answers
RL Advantage function why A = Q-V instead of A=V-Q?
In RL Course by David Silver - Lecture 7: Policy Gradient Methods, David explains what an Advantage function is, and how it's the difference between Q(s,a) and the V(s)
Preliminary, from this post:
First recall that a policy $\pi$ is a mapping…
Kari
- 2,756
- 2
- 21
- 51
10
votes
2 answers
Is this a Q-learning algorithm or just brute force?
I have been playing with an algorithm that learns how to play tictactoe. The basic pseudocode is:
repeat many thousand times {
repeat until game is over {
if(board layout is unknown or exploring) {
move randomly
} else {
move…
Ant Kutschera
- 211
- 1
- 7
10
votes
2 answers
Why does Q Learning diverge?
My Q-Learning algorithm's state values keep on diverging to infinity, which means my weights are diverging too. I use a neural network for my value-mapping.
I've tried:
Clipping the "reward + discount * maximum value of action" (max/min set to…
nedward
- 414
- 5
- 13
9
votes
1 answer
Understanding Reinforcement Learning with Neural Net (Q-learning)
I am trying to understand reinforcement learning and markov decision processes (MDP) in the case where a neural net is being used as the function approximator.
I'm having difficulty with the relationship between the MDP where the environment is…
CatsLoveJazz
- 247
- 1
- 10
8
votes
2 answers
How to teach neural network a policy for a board game using reinforcement learning?
I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm.
I'd like a neural net to have the following structure:
layer - rows * cols + 1 neurons - input - values of…
Luke
- 189
- 1
- 11
8
votes
1 answer
Understanding advantage functions
The paper explaining 'Advantage Updating' as a method to improve Q-learning uses the following as its motivation.
Q-learning requires relatively little computation per update, but it is useful to consider how the number of updates required scales…
mallochio
- 91
- 1
- 6
7
votes
2 answers
Representing similar states in reinforcement learning?
Let's say I'd like to design a Q learning algorithm that learns to play poker. The number of different possible States is very large, but a lot are very similar: for example, if the initial state having 10 spades, 4 hearts, 6 clubs on the table and…
shakedzy
- 699
- 1
- 5
- 24
7
votes
3 answers
Why random sample from replay for DQN?
I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…
ZAR
- 203
- 3
- 7
7
votes
1 answer
Simple Q-Table Learning: Understanding Example Code
I'm trying to follow a tutorial for Q-Table learning from this source, and am having difficulty understanding a small piece of the code. Here's the entire block:
import gym
import numpy as np
env = gym.make('FrozenLake-v0')
#Initialize table with…
aalberti333
- 195
- 1
- 5
7
votes
1 answer
What are the advantages / disadvantages of off-policy RL vs on-policy RL?
There are various algorithms for reinforcment learning (RL). One way to group them is by "off-policy" and "on-policy". I've heard that SARSA is on-policy, while Q-Learning is off-policy.
I think they work as follows:
My questions are:
How exactly…
Martin Thoma
- 19,540
- 36
- 98
- 170
6
votes
1 answer
Keras input dimension bug?
Keras has a problem with the input dimension. My first layer looks like this:
model.add(Dense(128, batch_size=1, input_shape=(150,), kernel_initializer="he_uniform", kernel_regularizer=regularizers.l2(0.01), activation="elu"))
As you can see the…
Ilovescience
- 91
- 6
5
votes
1 answer
Q learning - how to use experience replay, when playing against other agent?
I am currently trying to create a tic tac toe q learning neural network to introduce me to reinforcement learning, however it didn't work so I decided to try a simpler project requiring a network to train against static data rather than another…
Peter Jamieson
- 127
- 5