A picture of the energy landscape of deep neural networks

Speaker

Pratik Chaudhari

UCLA

Host

Bolei Zhou

MIT CSAIL

Abstract:
Stochastic gradient descent (SGD) is the gold standard of optimization in deep learning. It does not, however, exploit the special structure and geometry of the loss functions we wish to optimize, viz. those of deep neural networks. In this talk, we will focus on the geometry of the energy landscape at local minima with an aim of understanding the generalization properties of deep networks.

In practice, optima discovered by SGD have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We will first leverage upon this observation to construct an algorithm named Entropy-SGD that maximizes a local version of the free energy. Such a loss function favors flat regions of the energy landscape which are robust to perturbations and hence more generalizable, while simultaneously avoiding sharp, poorly-generalizable --- although possibly deep --- valleys. We will discuss connections of this algorithm with belief propagation and robust ensemble learning. Furthermore, we will establish a tight connection between such non-convex optimization algorithms and nonlinear partial differential equations. Empirical validation on CNNs and RNNs shows that Entropy-SGD and related algorithms compare favorably to state-of-the-art techniques in terms of both generalization error and training time.

arXiv: https://arxiv.org/abs/1611.01838, https://arxiv.org/abs/1704.04932

Bio:
Pratik Chaudhari is a PhD candidate in Computer Science at UCLA. With his advisor Stefano Soatto, he focuses on optimization algorithms for deep networks. He holds Master's and Engineer's degrees in Aeronautics and Astronautics from MIT where he worked on stochastic estimation and randomized motion planning algorithms for urban autonomous driving with Emilio Frazzoli.

Add to Calendar 2017-06-12 16:00:00 2017-06-12 17:00:00 America/New_York A picture of the energy landscape of deep neural networks Abstract: Stochastic gradient descent (SGD) is the gold standard of optimization in deep learning. It does not, however, exploit the special structure and geometry of the loss functions we wish to optimize, viz. those of deep neural networks. In this talk, we will focus on the geometry of the energy landscape at local minima with an aim of understanding the generalization properties of deep networks.In practice, optima discovered by SGD have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We will first leverage upon this observation to construct an algorithm named Entropy-SGD that maximizes a local version of the free energy. Such a loss function favors flat regions of the energy landscape which are robust to perturbations and hence more generalizable, while simultaneously avoiding sharp, poorly-generalizable --- although possibly deep --- valleys. We will discuss connections of this algorithm with belief propagation and robust ensemble learning. Furthermore, we will establish a tight connection between such non-convex optimization algorithms and nonlinear partial differential equations. Empirical validation on CNNs and RNNs shows that Entropy-SGD and related algorithms compare favorably to state-of-the-art techniques in terms of both generalization error and training time.arXiv: https://arxiv.org/abs/1611.01838, https://arxiv.org/abs/1704.04932Bio: Pratik Chaudhari is a PhD candidate in Computer Science at UCLA. With his advisor Stefano Soatto, he focuses on optimization algorithms for deep networks. He holds Master's and Engineer's degrees in Aeronautics and Astronautics from MIT where he worked on stochastic estimation and randomized motion planning algorithms for urban autonomous driving with Emilio Frazzoli. 32-D507

Organizer & Contact

Bolei Zhou

bzhou@csail.mit.edu

Part of

Vision Seminar Series 2017

A picture of the energy landscape of deep neural networks

Speaker

Host

June 12 2017

Location

Organizer & Contact

Part of

February 21

Attention and Activities in First Person Vision

April 13

The lifetime of an object - an object’s perspective onto interactions

A picture of the energy landscape of deep neural networks

Speaker

Host

June 12 2017

Location

Organizer & Contact

Part of

Related Events

February 21

Attention and Activities in First Person Vision

April 13

The lifetime of an object - an object’s perspective onto interactions