Thesis Defense: Learning Reconfigurable Vision Models

Host

John Guttag
CSAIL MIT
Abstract: Deep-learning models are notorious for their high computational costs and substantial data requirements. Furthermore, non-technical users most often lack the expertise needed to effectively tailor these models to their applications. In this talk, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems.

First, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and example set of image-label pairs that define a new segmentation task, our model produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. Second, we demonstrate the effectiveness of using hypernetworks for amortizing the cost of training multiple models. We characterize a hypernetwork training issue, and propose a revised formulation that leads to faster convergence and more stable training. We then introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs.

Committee: John Guttag (MIT), Adrian Dalca (MIT/HMS/MGH), Michael Carbin (MIT)