Is Learning Compatible with (Over)fitting to the Training Data?

Speaker

Sasha Rakhlin
MIT IDSS

Host

David Sontag
MIT CSAIL
Abstract:
We revisit the basic question: can a learning method be successful if it perfectly fits (interpolates/memorizes) the data? The question is motivated by the good out-of-sample performance of ``overparametrized'' deep neural networks that have the capacity to fit training data exactly, even if labels are randomized. The conventional wisdom in Statistics and ML is to regularize the solution and avoid data interpolation. We challenge this wisdom and propose several interpolation methods that work well, both in theory and in practice. In particular, we present a study of kernel ``ridgeless'' regression and describe a new phenomenon of implicit regularization, even in the absence of explicit bias-variance trade-off. We will discuss the nature of successful learning with interpolation, both in regression and classification.

Bio:
Alexander (Sasha) Rakhlin is an Associate Professor at MIT in the Department of Brain and Cognitive Sciences and the Statistics & Data Science Center. Sasha's research is in Statistics, Machine Learning, and Optimization. He received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and a co-director of the Penn Research in Machine Learning (PRiML) center. He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award.