Name: Equilibrated adaptive learning rates for non-convex optimization
Start: 2015-12-09T17:40:00-0500
End: 2015-12-09T18:00:00-0500

Back To Schedule

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of thecritical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner has undesirable behavior in the presence of both positive and negative curvature, and present theoretical and empirical evidence that the so-called equilibration preconditioner is comparatively better suited to non-convex problems. We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner. Our experiments demonstrate that both schemes yield very similar step directions but that ESGD sometimes surpasses RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.

Speakers

Yoshua Bengio

Harm de Vries

Wednesday December 9, 2015 17:40 - 18:00 EST
Room 210 A

Spotlights

NIPS 2015

Yoshua Bengio

Harm de Vries

Attendees (0)

NIPS 2015

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Yoshua Bengio

Harm de Vries

Attendees (0)