(THESIS DEFENSE) Efficient Generative Models for Visual Synthesis

Speaker: Tianwei Yin

Speaker Affiliation: MIT CSAIL

Host: Fredo Durand, William T. Freeman

Host Affiliation: MIT CSAIL

Abstract:

While current visual generative models achieve remarkable quality, they struggle with high computational costs and latency, limiting their use in interactive applications. In this talk, I will present my research on improving the efficiency of generative models for image and video creation. I will begin by introducing distribution matching distillation, a technique that enables the training of one- or few-step visual generators by distilling knowledge from powerful yet computationally expensive diffusion models. Next, I will present improved distillation methods that enhance robustness and scalability, leading to a production-grade few-step image generator that is now deployed in widely used software, generating hundreds of millions of images annually. Finally, I will show how we can further reduce the latency for video generation, by switching to an autoregressive generation paradigm, enabling fast interactive video generation and world simulation.

Thesis Committee: Fredo Durand (Thesis Supervisor), William T. Freeman (Thesis Supervisor), Vincent Sitzmann, Kaiming He

Add to Calendar 2025-03-13 10:30:00 2025-03-13 11:30:00 America/New_York (THESIS DEFENSE) Efficient Generative Models for Visual Synthesis Speaker: Tianwei Yin Speaker Affiliation: MIT CSAIL Host: Fredo Durand, William T. Freeman Host Affiliation: MIT CSAILAbstract:While current visual generative models achieve remarkable quality, they struggle with high computational costs and latency, limiting their use in interactive applications. In this talk, I will present my research on improving the efficiency of generative models for image and video creation. I will begin by introducing distribution matching distillation, a technique that enables the training of one- or few-step visual generators by distilling knowledge from powerful yet computationally expensive diffusion models. Next, I will present improved distillation methods that enhance robustness and scalability, leading to a production-grade few-step image generator that is now deployed in widely used software, generating hundreds of millions of images annually. Finally, I will show how we can further reduce the latency for video generation, by switching to an autoregressive generation paradigm, enabling fast interactive video generation and world simulation. Thesis Committee: Fredo Durand (Thesis Supervisor), William T. Freeman (Thesis Supervisor), Vincent Sitzmann, Kaiming He   TBD

(THESIS DEFENSE) Efficient Generative Models for Visual Synthesis

March 13 2025

Location