[Scale ML] Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion
Speaker
Host
Speaker: Boyuan Chen
Topic: Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion
Date: Wednesday, Feb 5
Time: 3:00 PM (EST)
Zoom: https://mit.zoom.us/j/91697262920 (password: mitmlscale)
Abstract.
Diffusion Forcing is a new sequence diffusion paradigm that combines the power of full-sequence diffusion models (like SORA) and next-token models (like LLMs), acting as either or a mix at sampling time without retraining. One can use Diffusion Forcing's unique properties for a wide range of applications across video diffusion, planning and robotics. Further, we present history guidance, a technique uniquely enabled by diffusion forcing that significantly enhances video diffusion's consistency, allowing one to roll out extremely long videos that was previously impossible.
Bio.
Boyuan Chen is a PhD student at MIT working with Prof Vincent Sitzmann and Prof Russ Tedrake. He is interested in model-based reinforcement learning, generative world model and their applications in embodied intelligence. Boyuan hopes to leverage video world models trained on internet-scale data as planners for general-purpose robots, replicating LLM's success but for the visual world. Previously, Boyuan also interned at Google Deepmind and Google X, working on equipping Google's foundation models with spatial reasoning capabilities.