The Journey, not the Destination: How Data Guides Diffusion Models

Speaker

Josh Vendrow
CSAIL MIT

Host

Sharut Gupta
CSAIL MIT
Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data—that is, identifying specific training examples which caused an image to be generated—remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing such attributions efficiently by leveraging recent work on data attribution in the supervised setting. Finally, we apply our method to find (and evaluate) such attributions for diffusion models trained on CIFAR-10 and MS COCO.

Speaker bio: Josh is a second year PhD student working with Aleksander Madry. Josh's research focuses on building machine learning models that are safe and robust when deployed in the real world.