Questions tagged [backpropagation]

Backpropagation or "backward propagation of errors," is an algorithm for supervised learning of artificial neural networks using gradient descent.

Given an artificial neural network and an error function, the backpropagation method calculates the gradient of the error function with respect to the neural network's weights. It is a generalization of the delta rule for perceptrons to multilayer feedforward neural networks.

31 questions
6
votes
3 answers

How to add the derivative of a matrix to the chain rule?

In machine learning, I'm optimizing a parameter matrix $W$. The loss function is$$L=f(y),$$where $L$ is a scalar, $y=Wx$, $x\in \mathbb{R}^n$, $y\in \mathbb{R}^m$ and the order of $W$ is $m\times n$. In all math textbooks, it is…
3
votes
0 answers

Estimation the computation complexity of algorithm with more that one choice

The problem: We have a weighted graph $G=(V, E, W)$ with $|V| = n$, $|E| = n-1$ and $W$ is set of edge's weight. The graph $G$ includes one ring on $n_1 \geq 3$ nodes and $n_2$ isolated nodes, $n_1+ n_2 = n$. We want to connect isolated nodes to the…
3
votes
1 answer

Derivative of Mean Square Error Function with respect to output

I'm trying to understand the gradient derivation for the back-propagation algorithm. I'm having trouble computing the explicit derivative of the Loss Mean Square Error function with respect to the output value in a regression setting. I only have…
2
votes
0 answers

What do I do once I have the Jacobian Matrix from Softmax Derivative

I am teaching myself Artificial Intelligence from scratch without libraries I have a decent handle on most of it UPDATE-EDIT I am lost however on the next step mathematically after deriving the softmax activation function as an example to hopefully…
2
votes
2 answers

Partial derivative with respect to a matrix in RNN backpropagation

I have an issue with the following problem. I am trying to derive the gradients with respect to $x_t, h_{t-1}, W_x, W_h$. $x_t$ is a $N*D$ vector. $h_t$ is a $N*H$ vector. $W_h$ is a $H*H$ matrix. $W_x$ is a $D*H$ matrix. The function is…
2
votes
0 answers

How to calculate the upper bound of the gradient of a multi layer ReLu neural network?

Question Layers: We shall denote in the following the layer number by the upper script $\ell$. We have $\ell=0$ for the input layer, $\ell=1$ for the first hidden layer, and $\ell=L$ for the output layer. The number of neurons in the layer $\ell$ is…
2
votes
1 answer

Root finding and automatic differentiation

Consider the equation $z = f (z, x)$. We would like to find $z^{\star}$ for $f$ such that $z^{\star} = f (z^{\star}, x)$. One way to do this problem is through naive iteration: $z^{(k + 1)} = f (z^{(k)}, x)$; stop when $z^{(k + 1)} \approx…
2
votes
1 answer

Jacobian Matrix of an Element wise operation on a Matrix

From ref 1 it is clear that when you have an elementwise operation on a vector; the Jacobian matrix of the function wrto its input vector is a diagonal matrix For an input vector $\textbf{x} = \{x_1, x_2, \dots, x_n\}$ on which an element wise…
2
votes
0 answers

Matrix Derivation for Neural Network Formula

I am learning some insights of Neural network but I have some problem with the derivation of matrix for backpropagation. On an assumption that the formula for calculating for one node in a neural network, which as been vectorized, is $Z^{[i]} =…
2
votes
2 answers

Backpropagate through stochastic node

In's commonly said that in VAE, we use reparameterization trick because "we can't backpropagate through stochastic node" It makes sense from the picture, but I found it hard to understand exactly what it means and why. Let's say X ~ N(u, 1). And we…
1
vote
1 answer

Derivative of the Cross Entropy loss function with the Softmax function

I am currently teaching myself the basics of neural networks and backpropagation but some steps regarding the derivation of the derivative of the Cross Entropy loss function with the Softmax activation function I do not understand. Given the loss…
1
vote
1 answer

Why is the numerator-layout Jacobian transposed in backpropagation calculation?

In the derivation of the backpropagation algorithm in Neural Network Design by Hagan et al., we consider the derivative of the scalar-valued sample loss function $\hat{F}$ with respect to a vector of "sensitivities" $\mathbf{n}^{m}$ in layer $m$ of…
1
vote
1 answer

How to derive expression for gradient in BPPT

I have the following problem: I am trying to derive final expressions for error gradients in a simple recurrent neural network (Backpropagation through Time, BPPT). The parameters and state update equations are the following: $\mathbf{x}_t \in R^n,…
1
vote
1 answer

Deriving backpropagation equations - vectorization (regression)

I have a huge problem trying to derive the backpropagation equations. All the solutions I've found online are not detailed as I'd like, hence I'm here asking your help. First of all sorry for this long preface, but I think it's mandatory in order to…
1
vote
0 answers

What is the derivative of the Softmax function AFTER subtracting the maximum value from each input?

I'm using the Softmax function as the activation function for the last layer of a neural network I am trying to code up. The function takes in a vector of elements, $\vec{z}$, where the length of $\vec{z}$ is $L$. The function returns the…
1
2