Questions tagged [gradient-descent]
47 questions
7
votes
0 answers
Algorithms for curve construction
I am interested in algorithms that construct continuous curves between two points in such a way that minimizes an energy functional of the curve. What sort of algorithms are most used for such tasks?
More formally, given two points $a$ and $b$, and…
user3658307
- 171
- 6
4
votes
1 answer
How to show that cross entropy is minimized?
This Question is taken from the book Neural Networks and DeepLearning by Michael Nielsen
The Question:
In a single-neuron ,It is argued that the cross-entropy is small if σ(z)≈y for all training inputs. The argument relied on y being equal to either…
Kartik chincholikar
- 55
- 1
- 7
4
votes
1 answer
Why do we use the log in gradient-based reinforcement algorithms?
I've been reading some papers on reinforcement learning.
$$\Delta w=\frac{\partial ln\ p_w}{\partial w}r$$
I often see expressions, similar to the above one, where the weights (denoted by $w$) are updated following the partial derivative of the…
user65539
4
votes
1 answer
MDS minimization with gradient descent
I have the following multiple dimensional scaling (MDS) minimization problem in vectors $v_1, v_2, \dots, v_n \in \mathbb R^2$
$$\min_{v_1, v_2, \dots, v_n} \sum_{i,j} \left( \|v_i - v_j\| - d_{i,j} \right)^2$$
which I wish to solve numerically…
CodeKingPlusPlus
- 517
- 1
- 6
- 14
4
votes
1 answer
Why update weights and biases after training a Neural Network on whole set of training samples
I am reading the book Neural Networks and Deep Learning by Micheal Nielsen. In the second chapter of his book, he describes the following algorithm for updating weights and biases for a neural network:
In the 2nd step, the algorithm computes the…
user5139637
- 185
- 5
3
votes
1 answer
Is it possible to solve the Mountain Car reinforcement learning task with linear Q-Learning using the state as direct input?
I'm trying to solve the Mountain Car task on OpenAI Gym (reach the top in 110 steps or less, having a maximum of 200 steps per episode) using linear Q-learning (the algorithm in figure 11.16, except using maxQ at s' instead of the actual a', as…
rcpinto
- 470
- 3
- 15
3
votes
1 answer
Why updating only a part of all neural network weights does not work?
I am having a problem with my program of deep neural network using Theano. In my deep neural network, I have several layers of neural network to predict an output given a certain input. Because of an issue when compiling theano, I have to debug my…
The Lazy Log
- 131
- 4
3
votes
1 answer
Speed up minimizing quadratic function by FFT
I'm trying to understand the following excerpt from a paper:
Subproblem 1: computing $S$. The $S$ estimation subproblem corresponds to minimizing
$$
\sum_{p}(S_p - I_p)^2 + \beta((\partial_xS_p - h_p)^2 + (\partial_yS_p - v_p)^2) \tag…
Yu Dai
- 131
- 2
3
votes
2 answers
Gradient descent overshoot - why does it diverge?
I'm thinking about gradient descent, but I don't get it.
I understand that it can overshoot the minimum when the learning rate is too large. But I can't understand why it would diverge.
Let's say we have
$$J(\theta_0, \theta_1) =…
user47979
3
votes
1 answer
Mathematical optimization with thresholded optimization function
Gradient descent can be used to minimize an objective function $\Phi:\mathbb{R}^d \to \mathbb{R}$, if we know how to evaluate $\Phi$ on any input of our choice.
However, my situation is a little different. I have an objective function $\Phi$ of the…
D.W.
- 167,959
- 22
- 232
- 500
2
votes
1 answer
Is there a universal learning rate for NeuralNetworks?
I'm currently creating a NeuralNetwork with backpropagation/gradient descent. There is this hyperparameter introduced called "learning rate" (η). Which has to be chosen to guarantee not overshooting the minimum of the cost function when doing…
LU15.W1R7H
- 23
- 2
2
votes
0 answers
About gradient descent on non-convex functions
There is this "folklore" result that gradient descent on a non-convex function takes $O(\frac n {\epsilon^2})$ steps to get to a point whose gradient norm is below $\epsilon$ and with SGD this takes $O(\frac {1}{\epsilon^4})$ steps.
Can someone…
gradstudent
- 493
- 2
- 8
2
votes
1 answer
Calculating gradient in a neural net using batches
I am a CS student learning about neural nets. Currently I am confused about how to train a neural net in batches. If I calculate error in a batch, I will get a vector of errors e.g. real1 - predicted1, real2 - predicted2, etc... How do I then…
swedishfished
- 121
- 1
2
votes
0 answers
Lazy Stochastic Gradient Descent: Multiplicative vs Additive
I am reading Bob Carpenter's note at http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf and William Cohen's note at http://www.cs.cmu.edu/~wcohen/10-605/notes/sgd-notes.pdf.
They described the same technique to lazily decay the…
user59369
- 21
- 1
2
votes
0 answers
Computing $\mathrm{tr}(X^{-1}Y)$ efficiently
I know that one can compute the expression $X^{-1}\mathbf{v}$ quickly with conjugate gradient method. Is there a similar approach for computing $\mathrm{tr}(X^{-1}Y)$?
Similarly interesting to me are $\mathrm{tr}(X^{-1})$ and…
R S
- 129
- 2