Root finding and automatic differentiation

Question

Consider the equation $z = f (z, x)$. We would like to find $z^{\star}$ for $f$ such that $z^{\star} = f (z^{\star}, x)$.

One way to do this problem is through naive iteration: $z^{(k + 1)} = f (z^{(k)}, x)$; stop when $z^{(k + 1)} \approx z^{(k)}$.

A faster way is to arrange the equation as $g (z) = f (z, x) - z$. This allows us to use Newton's root finding method: $$z^{(k + 1)} = z^{(k)} - \left( \frac{\partial g (z^{(k)})}{\partial z} \right)^{- 1} \cdot g (z^{(k)})$$ Having found $z^{\star}$, let's say we would like to find the the derivative of some loss function $l$ with respect to $x$. This can be done as: $$\frac{\partial l}{\partial x} = \frac{\partial l}{\partial z^{\star}} \frac{\partial z^{\star}}{\partial x} = \frac{\partial l}{\partial z^{\star}} \cdot - \left( \frac{\partial g}{\partial z^{\star}} \right)^{- 1} \frac{\partial g}{\partial x}$$ Let's say we would to do this using a software package that implements automatic differentiation like PyTorch or JAX.

Automatic differentiation has a foward pass and a backward pass. In the forward pass we simply iterate through $z^{(k + 1)} = z^{(k)} - J (z^{(k)})^{- 1} \cdot g (z^{(k)})$ and save the output of each iteration. In the backward pass, we evaluate the derivative $\frac{\partial l}{\partial x}$ through each of the iterations we had done in the forward pass. We can think of this as unrolling the forward pass and passing $\frac{\partial l}{\partial x}$ from the output to the input.

What I've just described is the standard way automatic differentiation is used with the backpropagation algorithm. The problem here is that we need to backpropagate through all the steps we did in the forward phase. This is not only time consuming, but requires us to store the outputs of all the iterations since they are needed in the backward pass.

I was reading this tutorial: http://implicit-layers-tutorial.org/implicit_functions/, where the author says, if we do implicit differentiation, then we don't need to save the intermediate values because the only Jacobian (the term $\frac{\partial g}{\partial z^{\star}}$) we need is the Jacobian at the solution point. This is kind of a big deal because it means if you can reformulate your function as an implicit function, then the backward phase of the backpropagation becomes free. My issue is I don't fully understand why we can avoid backpropagating through the solver by defining the function as an implicit function. I would appreciate if someone can enlighten me.

Are you saying that you have the implicit curve $z-f(z,x)=0$. This means that $z=z(x)$, at least locally. Then you want to find a constant $z^$ such that $z^-f(z^*,x)=0$ for any $x$? — José C Ferreira, Apr 06 '22 at 17:44

José C Ferreira · Answer 1 · 2022-04-06T18:45:23.690

If you have an equation $u=f(u,v)$. This means that you have an implicit function $g(u,v)=u-f(u,v)=0$, if you assume that the conditions of the implicit function theorem hold true.

If you pick some $v_0$, then you can try to solve the equation $u=f(u,v_0)$ numerically. Let's assume you find $u_k\approx u^*$, where $u^*$ is the correct solution. As the functions in this context are continuously differentiable, you find that the partial derivatives of $g(u,v)$, at the point $(u_k,v_0)$, are close to their respective partial derivatives at the point $(u^ *,v_0 )$, including those given by the implicit function theorem.

You can find many discussions searching for "(f(u,v)=0) implicit function " on SearchOnMath.

Root finding and automatic differentiation

1 Answers1