Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

NIPS 2018 | December 2018

Download BibTex

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε−3.25) backpropagations. The best result was essentially O(ε−4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε−3.25), with only oracle access to stochastic gradients.