I intend to give some glimpses, like one I did here.
Let us consider the minimization problem
$$0=g(\textbf{a})=\min_{\textbf{x}\in A}{g(\textbf{x})},\qquad {g(\textbf{x})}=\frac{1}{2}\|{\bf f}({\bf x})\|^2,$$ to some continuously differentiable function $\textbf{f}:A\to \mathbb{R}^p$, where $A$ is an open set of $\mathbb{R}^m$ containing $\textbf{a}$. Now, if you have some differentiable curve $\textbf{u}:(a,b)\to A$, you can apply the chain rule to obtain
$$\frac{d\, g({\bf u}(t))}{dt}= \left\langle {\bf u}'(t), \nabla g({\bf u}(t))\right\rangle= \left\langle {\bf u}'(t),[J{\bf f}({\bf u}(t))]^*{\bf f}({\bf u}(t))\right\rangle=\left\langle J{\bf f}({\bf u}(t)){\bf u}'(t),{\bf f}({\bf u}(t))\right\rangle,$$ in which $\langle \cdot,\cdot\rangle$ denotes the inner product.
A natural choice to $u(t)$ is given by the
the initial value problem (IVP) $$\left\{\begin{array}{rrrrl}{\bf u}'(t)&=&-\alpha \nabla g({\bf u}(t))\\ {\bf u}(0)&=&{\bf u}_0\end{array}\right.,$$
where $[J{\bf f}({\bf u}(t))]^* {\bf f}({\bf u}(t))=\nabla g({\bf u}(t))$, and $\alpha>0$.
If you use Euler method to solve this PVI numerically, you find the gradient descent method. This method, with step size $h_j$, takes the form
$${\bf u}_{j+1}=\phi({\bf u}_j),$$ to
$$\phi({\bf u})={\bf u}-h_j\alpha\left[J{\bf f}({\bf u})\right]^*{\bf f}({\bf u}),$$ as a fixed point iteration to solve $${\bf f}({\bf a})={\bf 0},\qquad \phi({\bf a})={\bf a}.$$ It converges when $$\|\phi'({\bf a})\|=\max_{1\leq i\leq m}|1-h_j\alpha s_i^2|<1,$$ if you have a good choice to ${\bf u}_0$, in which $s_i$ is a singular value of $J{\bf f}({\bf a})$.
It holds the inequality
$$\frac{d\, g({\bf u}(t))}{dt}= -\alpha\|\nabla g({\bf u}(t))\|^2\leq -2\alpha \sigma_{min}(t)^2g({\bf u}(t))\leq 0,$$ using the inequality
$$\|\nabla g({\bf u}(t))\|^2=\|[J{\bf f}({\bf u}(t))]^*{\bf f}({\bf u}(t))\|^2\geq \sigma_{min}(t)^2\|{\bf f}({\bf u}(t))\|^2,$$ in which $\sigma_{min}(t)$ is the smallest singular value of $J{\bf f}({\bf u}(t))$. You can prove that this produces the inequality $$g({\bf u}(t))\leq g({\bf u}(0))e^{-2\alpha \lambda(t)},\qquad \lambda(t)=\int_0^t\sigma_{min}(s)^2\,ds.$$
Another choice is the curve satisfing the initial value problem (IVP) $$\left\{\begin{array}{rrl}J{\bf f}({\bf u}(t)){\bf u}'(t)&=&-\alpha {\bf f}({\bf u}(t))\\ {\bf u}(0)&=&{\bf u}_0\end{array}\right.,$$ to some $\alpha>0$. You find that $$\frac{d\, g({\bf u}(t))}{dt}= -2\alpha g({\bf u}(t))$$ or $$g({\bf u}(t))=g({\bf u}(0))e^{-2\alpha t}.$$
In both cases, it follows that, ${g(\textbf{u}}(t))$ is a non increasing function. This also means that, if $g(\textbf{u}(t))> 0$, then $g(\textbf{u}(t+h))<g(\textbf{u}(t))$, when $0<h<h_t$, to some $h_t>0$ close enough to $0$. Please see Picard Theorem and Lyapunov stabylity theory.
If $m=p$ and $J{\bf f}({\bf x})$ has bounded inverse matrix, to all $\textbf{x}\in A$, the previous IVP becomes
$$\left\{\begin{array}{lll}{\bf u}'(t)&=&-\alpha \left[J{\bf f}({\bf u}(t))\right]^{-1}{\bf f}({\bf u}(t))\\ {\bf u}(0)&=&{\bf u}_0\end{array}\right..$$
We can use the Euler method $$\left\{\begin{array}{rll}J{\bf f}({\bf u}_j) {\bf w}_j&=&-\alpha_j {\bf f}({\bf u}_j)\\ {\bf u}_{j+1}&=&{\bf u}_j+{\bf w}_j\end{array}\right.,$$ to solve the previous IVP numerically, where $\textbf{u}_0=u(0)$, $t_{j+1}=t_j+h_j$, $0<h_j$, ${\bf u}(t_{j+1}) \approx {\bf u}_{j+1}$ and $\alpha_j=\alpha h_j$.
This method, with step size $h_j$, takes the form
$${\bf u}_{j+1}=\psi({\bf u}_j),$$ to
$$\psi({\bf u})={\bf u}-h_j\alpha\left[J{\bf f}({\bf u})\right]^{-1}{\bf f}({\bf u}),$$ as a fixed point iteration to solve $${\bf f}({\bf a})={\bf 0},\qquad \psi({\bf a})={\bf a}.$$ It converges when $$\|\psi_j'({\bf a})\|=|1-h_j\alpha|<1,$$ if you have a good choice to ${\bf u}_0$.
We call $\alpha_j$ as the tuning parameter, as we call it in gradient descent method, and you should be carefully to choose it to have $g({\bf u}_{j+1})<g({\bf u}_j)$. Otherwise you can has a "bad" approximation ${\bf u}(t_{j+1}) \approx {\bf u}_{j+1}$ in which $g({\bf u}_{j+1})>g({\bf u}_j)$.
But, when the things works well, $\alpha_j$ can be 1, to $j$ big enough.
Remark: You can rewrite this text using $$g(\textbf{x})=\frac{1}{2}\|\nabla \textbf{f}(\textbf{x})\|^2$$ instead, when you are working with the implication
$$ \min_{\textbf{x}\in A}{f(\textbf{x})}=f(\textbf{a})\Longrightarrow \nabla \textbf{f}(\textbf{a})=\textbf{0}.$$
The hessian matrix arises in this case.