Is Learning Compatible with (Over)fitting to the Training Data?

Speaker

Sasha Rakhlin

MIT IDSS

Host

David Sontag

MIT CSAIL

Abstract:
We revisit the basic question: can a learning method be successful if it perfectly fits (interpolates/memorizes) the data? The question is motivated by the good out-of-sample performance of ``overparametrized'' deep neural networks that have the capacity to fit training data exactly, even if labels are randomized. The conventional wisdom in Statistics and ML is to regularize the solution and avoid data interpolation. We challenge this wisdom and propose several interpolation methods that work well, both in theory and in practice. In particular, we present a study of kernel ``ridgeless'' regression and describe a new phenomenon of implicit regularization, even in the absence of explicit bias-variance trade-off. We will discuss the nature of successful learning with interpolation, both in regression and classification.

Bio:
Alexander (Sasha) Rakhlin is an Associate Professor at MIT in the Department of Brain and Cognitive Sciences and the Statistics & Data Science Center. Sasha's research is in Statistics, Machine Learning, and Optimization. He received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and a co-director of the Penn Research in Machine Learning (PRiML) center. He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award.

Add to Calendar 2018-11-14 16:30:00 2018-11-14 17:30:00 America/New_York Is Learning Compatible with (Over)fitting to the Training Data? Abstract:We revisit the basic question: can a learning method be successful if it perfectly fits (interpolates/memorizes) the data? The question is motivated by the good out-of-sample performance of ``overparametrized'' deep neural networks that have the capacity to fit training data exactly, even if labels are randomized. The conventional wisdom in Statistics and ML is to regularize the solution and avoid data interpolation. We challenge this wisdom and propose several interpolation methods that work well, both in theory and in practice. In particular, we present a study of kernel ``ridgeless'' regression and describe a new phenomenon of implicit regularization, even in the absence of explicit bias-variance trade-off. We will discuss the nature of successful learning with interpolation, both in regression and classification.Bio: Alexander (Sasha) Rakhlin is an Associate Professor at MIT in the Department of Brain and Cognitive Sciences and the Statistics & Data Science Center. Sasha's research is in Statistics, Machine Learning, and Optimization. He received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and a co-director of the Penn Research in Machine Learning (PRiML) center. He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award. 32-155

Organizer & Contact

Marcia G. Davidson

marcia@csail.mit.edu

617-253-3049

Part of

Machine Learning Seminar Series 2018

Is Learning Compatible with (Over)fitting to the Training Data?

Speaker

Host

November 14 2018

Location

Organizer & Contact

Part of

October 10

Is Q-learning Provably Efficient?

November 28

Optimal Algorithms for Continuous Non-monotone Submodular Maximization

Is Learning Compatible with (Over)fitting to the Training Data?

Speaker

Host

November 14 2018

Location

Organizer & Contact

Part of

Related Events

October 10

Is Q-learning Provably Efficient?

November 28

Optimal Algorithms for Continuous Non-monotone Submodular Maximization