Learning algorithms based on deep neural networks are well-known to (nearly) perfectly fit the training set and fit well even the random labels. The reasons for this tendency to memorize the labels of the training data are not well understood.
We provide a simple model for prediction problems in which such memorization is necessary for achieving close-to-optimal generalization error. In our model, data is sampled from a mixture of subpopulations and the frequencies of these subpopulations are chosen from some prior. Our analysis demonstrates that memorization becomes necessary whenever the frequency prior is long-tailed. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. We complement the theoretical results with experiments on several standard benchmarks showing that memorization is an essential part of deep learning.
Based on https://arxiv.org/abs/1906.05271 and an ongoing work with Chiyuan Zhang.
Vitaly Feldman is a research scientist at Google working on design and theoretical analysis of machine learning algorithms. His recent research interests include stability-based and information-theoretic tools for analysis of generalization, privacy-preserving learning, and adaptive data analysis. Vitaly holds a PhD from Harvard (2006) and was previously a research scientist at IBM Research - Almaden (2007-2017). He serves as a director on the steering committee of the Association for Computational Learning and was a program co-chair for COLT 2016.