Highest Voted 'gradient-descent' Questions - Computer Science Stack Exchange

7

votes

0 answers

Algorithms for curve construction

I am interested in algorithms that construct continuous curves between two points in such a way that minimizes an energy functional of the curve. What sort of algorithms are most used for such tasks? More formally, given two points $a$ and $b$, and…

asked Aug 14 '17 at 04:29

user3658307

171
6

4

votes

1 answer

How to show that cross entropy is minimized?

This Question is taken from the book Neural Networks and DeepLearning by Michael Nielsen The Question: In a single-neuron ,It is argued that the cross-entropy is small if σ(z)≈y for all training inputs. The argument relied on y being equal to either…

machine-learning neural-networks gradient-descent

asked Apr 19 '17 at 21:49

Kartik chincholikar

55
1
7

4

votes

1 answer

Why do we use the log in gradient-based reinforcement algorithms?

I've been reading some papers on reinforcement learning. $$\Delta w=\frac{\partial ln\ p_w}{\partial w}r$$ I often see expressions, similar to the above one, where the weights (denoted by $w$) are updated following the partial derivative of the…

machine-learning artificial-intelligence reinforcement-learning gradient-descent

asked Feb 19 '17 at 17:39

user65539

4

votes

1 answer

MDS minimization with gradient descent

I have the following multiple dimensional scaling (MDS) minimization problem in vectors $v_1, v_2, \dots, v_n \in \mathbb R^2$ $$\min_{v_1, v_2, \dots, v_n} \sum_{i,j} \left( \|v_i - v_j\| - d_{i,j} \right)^2$$ which I wish to solve numerically…

optimization machine-learning numerical-algorithms gradient-descent

asked Nov 08 '16 at 00:32

CodeKingPlusPlus

517
1
6
14

4

votes

1 answer

Why update weights and biases after training a Neural Network on whole set of training samples

I am reading the book Neural Networks and Deep Learning by Micheal Nielsen. In the second chapter of his book, he describes the following algorithm for updating weights and biases for a neural network: In the 2nd step, the algorithm computes the…

neural-networks gradient-descent

asked Jul 31 '16 at 22:23

user5139637

185
5

3

votes

1 answer

Is it possible to solve the Mountain Car reinforcement learning task with linear Q-Learning using the state as direct input?

I'm trying to solve the Mountain Car task on OpenAI Gym (reach the top in 110 steps or less, having a maximum of 200 steps per episode) using linear Q-learning (the algorithm in figure 11.16, except using maxQ at s' instead of the actual a', as…

reference-request reinforcement-learning gradient-descent

asked Sep 25 '16 at 23:44

rcpinto

470
3
15

3

votes

1 answer

Why updating only a part of all neural network weights does not work?

I am having a problem with my program of deep neural network using Theano. In my deep neural network, I have several layers of neural network to predict an output given a certain input. Because of an issue when compiling theano, I have to debug my…

machine-learning neural-networks gradient-descent

asked Aug 03 '16 at 12:04

The Lazy Log

131
4

3

votes

1 answer

Speed up minimizing quadratic function by FFT

I'm trying to understand the following excerpt from a paper: Subproblem 1: computing $S$. The $S$ estimation subproblem corresponds to minimizing $$ \sum_{p}(S_p - I_p)^2 + \beta((\partial_xS_p - h_p)^2 + (\partial_yS_p - v_p)^2) \tag…

optimization fourier-transform gradient-descent

asked May 03 '16 at 17:04

Yu Dai

131
2

3

votes

2 answers

Gradient descent overshoot - why does it diverge?

I'm thinking about gradient descent, but I don't get it. I understand that it can overshoot the minimum when the learning rate is too large. But I can't understand why it would diverge. Let's say we have $$J(\theta_0, \theta_1) =…

optimization machine-learning gradient-descent

asked Mar 16 '16 at 11:36

user47979

3

votes

1 answer

Mathematical optimization with thresholded optimization function

Gradient descent can be used to minimize an objective function $\Phi:\mathbb{R}^d \to \mathbb{R}$, if we know how to evaluate $\Phi$ on any input of our choice. However, my situation is a little different. I have an objective function $\Phi$ of the…

algorithms optimization numerical-algorithms gradient-descent

asked Dec 18 '15 at 00:56

D.W.

167,959
22
232
500

2

votes

1 answer

Is there a universal learning rate for NeuralNetworks?

I'm currently creating a NeuralNetwork with backpropagation/gradient descent. There is this hyperparameter introduced called "learning rate" (η). Which has to be chosen to guarantee not overshooting the minimum of the cost function when doing…

machine-learning neural-networks gradient-descent

asked Jul 12 '18 at 13:43

LU15.W1R7H

23
2

2

votes

0 answers

About gradient descent on non-convex functions

There is this "folklore" result that gradient descent on a non-convex function takes $O(\frac n {\epsilon^2})$ steps to get to a point whose gradient norm is below $\epsilon$ and with SGD this takes $O(\frac {1}{\epsilon^4})$ steps. Can someone…

machine-learning reference-request gradient-descent

asked Mar 06 '18 at 23:49

gradstudent

493
2
8

2

votes

1 answer

Calculating gradient in a neural net using batches

I am a CS student learning about neural nets. Currently I am confused about how to train a neural net in batches. If I calculate error in a batch, I will get a vector of errors e.g. real1 - predicted1, real2 - predicted2, etc... How do I then…

machine-learning neural-networks gradient-descent

asked Dec 05 '16 at 22:28

swedishfished

121
1

2

votes

0 answers

Lazy Stochastic Gradient Descent: Multiplicative vs Additive

I am reading Bob Carpenter's note at http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf and William Cohen's note at http://www.cs.cmu.edu/~wcohen/10-605/notes/sgd-notes.pdf. They described the same technique to lazily decay the…

algorithms optimization machine-learning gradient-descent

asked Oct 06 '16 at 00:26

user59369

21
1

2

votes

0 answers

Computing $\mathrm{tr}(X^{-1}Y)$ efficiently

I know that one can compute the expression $X^{-1}\mathbf{v}$ quickly with conjugate gradient method. Is there a similar approach for computing $\mathrm{tr}(X^{-1}Y)$? Similarly interesting to me are $\mathrm{tr}(X^{-1})$ and…

algorithms approximation linear-algebra gradient-descent

asked Jun 27 '16 at 08:02

R S

129
2

Questions tagged [gradient-descent]