[Scale ML] Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

Speaker

Boyuan Chen

Host

Scale ML

Speaker: Boyuan Chen

Topic: Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

Date: Wednesday, Feb 5

Time: 3:00 PM (EST)

Zoomhttps://mit.zoom.us/j/91697262920 (password: mitmlscale)

 

Abstract. 

Diffusion Forcing is a new sequence diffusion paradigm that combines the power of full-sequence diffusion models (like SORA) and next-token models (like LLMs), acting as either or a mix at sampling time without retraining. One can use Diffusion Forcing's unique properties for a wide range of applications across video diffusion, planning and robotics. Further, we present history guidance, a technique uniquely enabled by diffusion forcing that significantly enhances video diffusion's consistency, allowing one to roll out extremely long videos that was previously impossible.

 

Bio. 

Boyuan Chen is a PhD student at MIT working with Prof Vincent Sitzmann and Prof Russ Tedrake. He is interested in model-based reinforcement learning, generative world model and their applications in embodied intelligence. Boyuan hopes to leverage video world models trained on internet-scale data as planners for general-purpose robots, replicating LLM's success but for the visual world. Previously, Boyuan also interned at Google Deepmind and Google X, working on equipping Google's foundation models with spatial reasoning capabilities.