Optimization methods used in machine learning

Question

I don't have too much knowledge in the field of ML, but from my naive point of view it always seems that some variant of gradient descent is used when training neutral networks. As such, I was wondering why more advanced methods don't seemed to be used, such as SQP algorithms or interior-point methods. Is it because training a neutral net is always a simple unconstrained optimization problem, and the above-mentioned methods would be unnecessary? Any insight would be great, thanks.

Tuyen · Answer 1 · 2019-04-23T05:44:47.560

In my reply here

Does gradient descent always converge to an optimum?

it is explained that standard gradient descent works well because backtracking gradient descent works well (proven in our recent paper mentioned in the post) and in the long run backtracking gradient descent behaves like the standard gradient descent.

The main issue with other methods, I think, is that they require too strong conditions for convergence or no convergence is proven at all. In both cases, these make them less applicable to realistic applications.

Optimization methods used in machine learning

1 Answers1