There are two different (but related) Newton's methods: one for solving nonlinear equations, and one for optimization. Newton's method for optimization is explained here.
Here's a quick explanation of Newton's method for solving $f(x) = 0$, where $f:\mathbb R^N \to \mathbb R^N$ is a differentiable function. Given our current estimate $x_n$ of a solution, ideally we would like to find $\Delta x$ such that
$f(x_n + \Delta x) = 0$. However, rather than solving this condition exactly (which is likely too difficult), we instead use the approximation
$f(x_n + \Delta x) \approx f(x_n) + f'(x_n) \Delta x$, and we find $\Delta x$ such that
$f(x_n) + f'(x_n) \Delta x = 0$. In other words,
we take $\Delta x = -\ f'(x_n)^{-1} f(x_n)$.
We can hope that $x_{n+1} = x_n + \Delta x$ is improvement upon $x_n$.
(We have to worry that $f'(x_n)$ might not be invertible, but it can be shown that this won't happen if we start the iteration sufficiently close to a vector $x^*$ such that $f(x^*) = 0$ and $f'(x^*)$ is invertible. I would have to check exactly what conditions are assumed in a convergence proof.)
By the way, Newton's method for optimization minimizes a twice-differentiable function $f$ simply by solving the equation $\nabla f(x) = 0$ using Newton's method for nonlinear systems.