Questions tagged [q-learning]

A model-free reinforcement learning technique.

128 questions
52
votes
3 answers

What is "experience replay" and what are its benefits?

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…
Ryan Zotti
  • 4,209
  • 3
  • 21
  • 33
12
votes
1 answer

Why could my DDQN get significantly worse after beating the game repeatedly?

I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…
12
votes
1 answer

Reinforcement learning: decreasing loss without increasing reward

I'm trying to solve OpenAI Gym's LunarLander-v2. I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score. Generally the loss decreases over many episodes but the reward doesn't improve much. How…
Atuos
  • 327
  • 1
  • 2
  • 7
11
votes
2 answers

RL Advantage function why A = Q-V instead of A=V-Q?

In RL Course by David Silver - Lecture 7: Policy Gradient Methods, David explains what an Advantage function is, and how it's the difference between Q(s,a) and the V(s) Preliminary, from this post: First recall that a policy $\pi$ is a mapping…
Kari
  • 2,756
  • 2
  • 21
  • 51
10
votes
2 answers

Is this a Q-learning algorithm or just brute force?

I have been playing with an algorithm that learns how to play tictactoe. The basic pseudocode is: repeat many thousand times { repeat until game is over { if(board layout is unknown or exploring) { move randomly } else { move…
10
votes
2 answers

Why does Q Learning diverge?

My Q-Learning algorithm's state values keep on diverging to infinity, which means my weights are diverging too. I use a neural network for my value-mapping. I've tried: Clipping the "reward + discount * maximum value of action" (max/min set to…
9
votes
1 answer

Understanding Reinforcement Learning with Neural Net (Q-learning)

I am trying to understand reinforcement learning and markov decision processes (MDP) in the case where a neural net is being used as the function approximator. I'm having difficulty with the relationship between the MDP where the environment is…
CatsLoveJazz
  • 247
  • 1
  • 10
8
votes
2 answers

How to teach neural network a policy for a board game using reinforcement learning?

I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm. I'd like a neural net to have the following structure: layer - rows * cols + 1 neurons - input - values of…
8
votes
1 answer

Understanding advantage functions

The paper explaining 'Advantage Updating' as a method to improve Q-learning uses the following as its motivation. Q-learning requires relatively little computation per update, but it is useful to consider how the number of updates required scales…
mallochio
  • 91
  • 1
  • 6
7
votes
2 answers

Representing similar states in reinforcement learning?

Let's say I'd like to design a Q learning algorithm that learns to play poker. The number of different possible States is very large, but a lot are very similar: for example, if the initial state having 10 spades, 4 hearts, 6 clubs on the table and…
shakedzy
  • 699
  • 1
  • 5
  • 24
7
votes
3 answers

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…
7
votes
1 answer

Simple Q-Table Learning: Understanding Example Code

I'm trying to follow a tutorial for Q-Table learning from this source, and am having difficulty understanding a small piece of the code. Here's the entire block: import gym import numpy as np env = gym.make('FrozenLake-v0') #Initialize table with…
aalberti333
  • 195
  • 1
  • 5
7
votes
1 answer

What are the advantages / disadvantages of off-policy RL vs on-policy RL?

There are various algorithms for reinforcment learning (RL). One way to group them is by "off-policy" and "on-policy". I've heard that SARSA is on-policy, while Q-Learning is off-policy. I think they work as follows: My questions are: How exactly…
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170
6
votes
1 answer

Keras input dimension bug?

Keras has a problem with the input dimension. My first layer looks like this: model.add(Dense(128, batch_size=1, input_shape=(150,), kernel_initializer="he_uniform", kernel_regularizer=regularizers.l2(0.01), activation="elu")) As you can see the…
5
votes
1 answer

Q learning - how to use experience replay, when playing against other agent?

I am currently trying to create a tic tac toe q learning neural network to introduce me to reinforcement learning, however it didn't work so I decided to try a simpler project requiring a network to train against static data rather than another…
1
2 3
8 9