Highest Voted 'q-learning' Questions - Data Science Stack Exchange

52

votes

3 answers

What is "experience replay" and what are its benefits?

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…

reinforcement-learning q-learning

asked Jul 19 '17 at 04:15

Ryan Zotti

4,209
3
21
33

12

votes

1 answer

Why could my DDQN get significantly worse after beating the game repeatedly?

I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…

python deep-learning reinforcement-learning q-learning openai-gym

asked Jul 20 '19 at 08:06

Danny Tuppeny

223
2
7

12

votes

1 answer

Reinforcement learning: decreasing loss without increasing reward

I'm trying to solve OpenAI Gym's LunarLander-v2. I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score. Generally the loss decreases over many episodes but the reward doesn't improve much. How…

reinforcement-learning q-learning

asked Sep 04 '18 at 12:06

Atuos

327
1
2
7

11

votes

2 answers

RL Advantage function why A = Q-V instead of A=V-Q?

In RL Course by David Silver - Lecture 7: Policy Gradient Methods, David explains what an Advantage function is, and how it's the difference between Q(s,a) and the V(s) Preliminary, from this post: First recall that a policy $\pi$ is a mapping…

reinforcement-learning q-learning variance

asked Sep 01 '18 at 03:08

Kari

2,756
2
21
51

10

votes

2 answers

Is this a Q-learning algorithm or just brute force?

I have been playing with an algorithm that learns how to play tictactoe. The basic pseudocode is: repeat many thousand times { repeat until game is over { if(board layout is unknown or exploring) { move randomly } else { move…

machine-learning neural-network reinforcement-learning q-learning

asked Mar 10 '18 at 11:03

Ant Kutschera

211
1
7

10

votes

2 answers

Why does Q Learning diverge?

My Q-Learning algorithm's state values keep on diverging to infinity, which means my weights are diverging too. I use a neural network for my value-mapping. I've tried: Clipping the "reward + discount * maximum value of action" (max/min set to…

machine-learning python reinforcement-learning q-learning

asked Aug 11 '17 at 01:11

nedward

414
5
13

9

votes

1 answer

Understanding Reinforcement Learning with Neural Net (Q-learning)

I am trying to understand reinforcement learning and markov decision processes (MDP) in the case where a neural net is being used as the function approximator. I'm having difficulty with the relationship between the MDP where the environment is…

machine-learning neural-network q-learning

asked Feb 18 '16 at 10:11

CatsLoveJazz

247
1
10

8

votes

2 answers

How to teach neural network a policy for a board game using reinforcement learning?

I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm. I'd like a neural net to have the following structure: layer - rows * cols + 1 neurons - input - values of…

machine-learning neural-network reinforcement-learning q-learning

asked Jan 05 '16 at 13:28

Luke

189
1
11

8

votes

1 answer

Understanding advantage functions

The paper explaining 'Advantage Updating' as a method to improve Q-learning uses the following as its motivation. Q-learning requires relatively little computation per update, but it is useful to consider how the number of updates required scales…

reinforcement-learning q-learning

asked Nov 29 '16 at 12:08

mallochio

91
1
6

7

votes

2 answers

Representing similar states in reinforcement learning?

Let's say I'd like to design a Q learning algorithm that learns to play poker. The number of different possible States is very large, but a lot are very similar: for example, if the initial state having 10 spades, 4 hearts, 6 clubs on the table and…

reinforcement-learning q-learning

asked Aug 05 '18 at 21:01

shakedzy

699
1
5
24

7

votes

3 answers

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…

neural-network deep-learning reinforcement-learning q-learning dqn

asked Nov 19 '17 at 15:25

ZAR

203
3
7

7

votes

1 answer

Simple Q-Table Learning: Understanding Example Code

I'm trying to follow a tutorial for Q-Table learning from this source, and am having difficulty understanding a small piece of the code. Here's the entire block: import gym import numpy as np env = gym.make('FrozenLake-v0') #Initialize table with…

python reinforcement-learning q-learning

asked Sep 13 '17 at 12:44

aalberti333

195
1
5

7

votes

1 answer

What are the advantages / disadvantages of off-policy RL vs on-policy RL?

There are various algorithms for reinforcment learning (RL). One way to group them is by "off-policy" and "on-policy". I've heard that SARSA is on-policy, while Q-Learning is off-policy. I think they work as follows: My questions are: How exactly…

reinforcement-learning q-learning

asked Jul 27 '16 at 14:35

Martin Thoma

19,540
36
98
170

6

votes

1 answer

Keras input dimension bug?

Keras has a problem with the input dimension. My first layer looks like this: model.add(Dense(128, batch_size=1, input_shape=(150,), kernel_initializer="he_uniform", kernel_regularizer=regularizers.l2(0.01), activation="elu")) As you can see the…

python neural-network deep-learning keras q-learning

asked Aug 01 '17 at 16:54

Ilovescience

91
6

5

votes

1 answer

Q learning - how to use experience replay, when playing against other agent?

I am currently trying to create a tic tac toe q learning neural network to introduce me to reinforcement learning, however it didn't work so I decided to try a simpler project requiring a network to train against static data rather than another…

machine-learning neural-network q-learning

asked Feb 01 '18 at 15:58

Peter Jamieson

127
5

Questions tagged [q-learning]