Nonvacuous Generalization Bounds for Deep Neural Networks via PAC-Bayes

Speaker

Daniel Roy
University of Toronto

Host

David Sontag
MIT CSAIL
Abstract:

A serious impediment to a rigorous understanding of the generalization performance of algorithms like SGD for neural networks is that most generalization bounds are numerically vacuous when applied to modern networks on real data sets. In recent work (Dziugaite and Roy, UAI 2017), we argue that it is time to revisit the problem of computing nonvacuous bounds, and show how the empirical phenomenon of "flat minima" can be operationalized using PAC-Bayesian bounds, yielding the first nonvacuous bounds for a large (stochastic) neural network on MNIST. The bound is obtained by first running SGD and then optimizing the distribution of a random perturbation of the weights so as to capture the flatness and minimize the PAC-Bayes bound. I will describe this work, its antecedents, its goals, and subsequent work, focusing on where others have and have not made progress towards understanding generalization according to our strict criteria.

Joint work with Gintare Karolina Dziugaite based on https://arxiv.org/abs/1703.11008, https://arxiv.org/abs/1712.09376, and https://arxiv.org/abs/1802.09583

Bio:

Daniel Roy is an Assistant Professor in the Department of Statistical Sciences and, by courtesy, Computer Science at the University of Toronto, and a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of an Ontario Early Researcher Award and Google Faculty Research Award. Before joining U of T, Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Fellowship at Emmanuel College, University of Cambridge. Daniel earned his S.B., M.Eng., and Ph.D. from the Massachusetts Institute of Technology: his dissertation on probabilistic programming won an MIT EECS Sprowls Dissertation Award. Daniel's group works on foundations of machine learning and statistics.