EECS Special Seminar: Towards Deeper Understandings of Deep Learning

Speaker

Yuanzhi Li

Stanford University

Host

Prof. Tommi Jaakkola

MIT-CSAIL

Abstract:
Learning through highly complicated and non-convex systems plays an important rule in machine learning. Recently, a vast amount of empirical works have demonstrated the success of these methods, especially in deep learning. However, the formal study of the principles behind them is much less developed.

This talk will cover a few recent results towards developing such principles. Firstly, we focus on the principle of ``over-parameterization''. We show that for neural networks such as CNNs, ResNet and RNNs, as long as enough over-parameterization is given, algorithms such as stochastic gradient descent (SGD) provably finds the global optimal on the training data set. Moreover, the solution also generalizes to test data set as long as the training labels are realizable by certain teacher networks.

The second result will cover the principle of ``being noisy''. We show how, for certain data sets, the neural network found by SGD with a large learning rate (i.e. step size) at the begining follow by a learning rate decay generalizes better than the one found by SGD with a small learning rate, even when both case have the same training loss.

Bio: Yuanzhi Li is a postdoctoral researcher at the Computer Science Department of Stanford University. Previously, he obtained his Ph.D. at Princeton (2014-2018) under the advice of Sanjeev Arora. His research interests include topics in deep learning, non-convex optimization, algorithms, and online learning.

Host: Tommi Jaakkola

Add to Calendar 2019-02-11 16:00:00 2019-02-11 17:00:00 America/New_York EECS Special Seminar: Towards Deeper Understandings of Deep Learning Abstract:Learning through highly complicated and non-convex systems plays an important rule in machine learning. Recently, a vast amount of empirical works have demonstrated the success of these methods, especially in deep learning. However, the formal study of the principles behind them is much less developed. This talk will cover a few recent results towards developing such principles. Firstly, we focus on the principle of ``over-parameterization''. We show that for neural networks such as CNNs, ResNet and RNNs, as long as enough over-parameterization is given, algorithms such as stochastic gradient descent (SGD) provably finds the global optimal on the training data set. Moreover, the solution also generalizes to test data set as long as the training labels are realizable by certain teacher networks. The second result will cover the principle of ``being noisy''. We show how, for certain data sets, the neural network found by SGD with a large learning rate (i.e. step size) at the begining follow by a learning rate decay generalizes better than the one found by SGD with a small learning rate, even when both case have the same training loss.Bio: Yuanzhi Li is a postdoctoral researcher at the Computer Science Department of Stanford University. Previously, he obtained his Ph.D. at Princeton (2014-2018) under the advice of Sanjeev Arora. His research interests include topics in deep learning, non-convex optimization, algorithms, and online learning. Host: Tommi Jaakkola 32-G449 Patil/Kiva

Organizer & Contact

Mary McDavitt

mmcdavit@csail.mit.edu

617-253-9620

Part of

EECS Special Seminar Series 2019

EECS Special Seminar: Towards Deeper Understandings of Deep Learning

Speaker

Host

February 11 2019

Location

Organizer & Contact

Part of

March 18

EECS Special Seminar: Agency in the Era of Learning Systems

April 01

EECS Special Seminar: Natacha Crooks, "A Client-Centric Approach to Transactional Datastores"

EECS Special Seminar: Towards Deeper Understandings of Deep Learning

Speaker

Host

February 11 2019

Location

Organizer & Contact

Part of

Related Events

March 18

EECS Special Seminar: Agency in the Era of Learning Systems

April 01

EECS Special Seminar: Natacha Crooks, "A Client-Centric Approach to Transactional Datastores"