2

There is this "folklore" result that gradient descent on a non-convex function takes $O(\frac n {\epsilon^2})$ steps to get to a point whose gradient norm is below $\epsilon$ and with SGD this takes $O(\frac {1}{\epsilon^4})$ steps.

  • Can someone share a reference where this is proven?

I am aware of the recent references where these numbers have been improved. But I am not able to locate a pedagogic presentation of these older results.

gradstudent
  • 493
  • 2
  • 8

0 Answers0