The Gauss-Newton method is an approximation of the Newton method for specialized problems like
$$
\underset{\mathbf{x}}{\operatorname{argmin}}\;\mathbf{r}(\mathbf{x})^T\mathbf{r}(\mathbf{x})
$$
In other words, it finds a solution $\mathbf{x}$ that minimizes the squared norm of a nonlinear function $||\mathbf{r}(\mathbf{x})||_2^2$.
If you look at the update step for gradient descent and Gauss-Newton applied to the equivalent problem $\frac{1}{2}\mathbf{r}(\mathbf{x})^T\mathbf{r}(\mathbf{x})$, the relationship becomes clear:
Gradient descent
$$
\begin{align}
\mathbf{x}_{n+1} &= \mathbf{x}_n - \mu \Delta(\frac{1}{2}\mathbf{r}(\mathbf{x_n})^T\mathbf{r}(\mathbf{x_n})) \\
&= \mathbf{x}_n - \mu\mathbf{J}_r^T\mathbf{r}(\mathbf{x}_n)
\end{align}
$$
Gauss-Newton
$$
\begin{align}
\mathbf{x}_{n+1} = \mathbf{x}_n - (\mathbf{J}_r^T\mathbf{J}_r)^{-1}\mathbf{J}_r^T\mathbf{r}(\mathbf{x}_n)
\end{align}
$$
The structure of the problem enables the approximation of the Hessian used in Newton's method as $\mathbf{H} \approx \mathbf{J}_r^T\mathbf{J}_r$. As you said, the method jumps to the minimum of the second order Taylor-approximation around $\mathbf{x}_n$ in every step.
The qualitative behavior in the neighborhood of a solution is that the approximated second-order (curvature) information allows for convergence along a more direct, less "zigzaggy" path. It also converges faster than gradient descent. Imagine how the region that is approximated as a quadratic function (the one that you "jump across" in an iteration) becomes smaller and smaller. In turn, that approximation becomes more and more accurate for a sufficiently smooth function.
However, if the initial guess is far away from a solution, the (approximated) Hessian can become ill-conditioned. The resulting correction-vector is not guaranteed to point in the general direction of descent anymore (if the angle between it and the steepest descent is larger than 90°, the method actually diverges).