I don't have too much knowledge in the field of ML, but from my naive point of view it always seems that some variant of gradient descent is used when training neutral networks. As such, I was wondering why more advanced methods don't seemed to be used, such as SQP algorithms or interior-point methods. Is it because training a neutral net is always a simple unconstrained optimization problem, and the above-mentioned methods would be unnecessary? Any insight would be great, thanks.
Asked
Active
Viewed 232 times
1 Answers
1
In my reply here
Does gradient descent always converge to an optimum?
it is explained that standard gradient descent works well because backtracking gradient descent works well (proven in our recent paper mentioned in the post) and in the long run backtracking gradient descent behaves like the standard gradient descent.
The main issue with other methods, I think, is that they require too strong conditions for convergence or no convergence is proven at all. In both cases, these make them less applicable to realistic applications.
Tuyen
- 141
- 1
- 4