In this thesis defense, I will present my work on the "Lottery Ticket Hypothesis," which provides a new perspective on understanding how neural networks learn in practice and how we can make this process more efficient. We have known for decades that it is possible to delete up to 90% of connections from trained neural networks (known as pruning) without any effect on accuracy. In my thesis work, I showed that it is also possible to train such pruned networks from at or near the start, something previous consensus deemed impossible. The takeaway of this finding is that neural networks can successfully learn with far less capacity than we typically provide. This has significant practical and scientific implications. Practically speaking, it sheds light on a new opportunity to dramatically reduce the cost of training the extraordinary models that are increasingly out of reach for all but the best resourced companies. Scientifically speaking, it surprisingly suggests that the capacity necessary for a neural network to learn a function is similar to the capacity necessary to represent it.
I will present the initial work on the Lottery Ticket Hypothesis (ICLR 2019 Best Paper Award), the follow-up work showing how to scale up these findings and providing insights into when and why sparse trainable networks exist (Linear Mode Connectivity and the Lottery Ticket Hypothesis, ICML 2020), and the state of affairs when it comes to exploiting these findings for practical purposes (Pruning Neural Networks at Initialization: Why are we missing the mark?, ICLR 2021). I will close by discussing the implications of this work, including the numerous new research directions it has catalyzed - such as on neural network pruning, efficient training, loss landscape analysis, model averaging for ensembling, and deep learning theory - and the evolution of this empirical approach to understanding and improving deep learning that forms the basis for my startup MosaicML.