On the Foundations of Deep Learning: SGD, Overparametrization, and Generalization


Jason D. Lee
University of Southern California


Aleksander Madry

Deep Learning has had phenomenal empirical successes in many domains including computer vision, natural language processing, and speech recognition. To consolidate and boost the empirical success, we need to develop a more systematic and deeper understanding of the elusive principles of deep learning.

In this talk, I will provide theoretical analysis of several elements of deep learning including non-convex optimization, overparametrization, and generalization error. First, we show that gradient descent and many other algorithms are guaranteed to converge to a local minimizer of the loss. For several interesting problems including the matrix completion problem, this guarantees that we converge to a global minimum. Then we will show that gradient descent converges to a global minimizer for deep overparametrized networks. Finally, we analyze the generalization error by showing that a subtle combination of SGD, logistic loss, and architecture combine to promote large margin classifiers, which are guaranteed to have low generalization error. Together, these results show that on overparametrized deep networks SGD finds solution of both low train and test error.


Jason Lee is an assistant professor in Data Sciences and Operations at the University of Southern California. Prior to that, he was a postdoctoral researcher at UC Berkeley working with Michael Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in statistics, machine learning, and optimization. Lately, he has worked on high dimensional statistical inference, analysis of non-convex optimization algorithms, and theory for deep learning.