[Scale ML] Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

Speaker

Boyuan Chen

Host

Scale ML

Speaker: Boyuan Chen

Topic: Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

Date: Wednesday, Feb 5

Time: 3:00 PM (EST)

Zoom: https://mit.zoom.us/j/91697262920 (password: mitmlscale)

Abstract.

Diffusion Forcing is a new sequence diffusion paradigm that combines the power of full-sequence diffusion models (like SORA) and next-token models (like LLMs), acting as either or a mix at sampling time without retraining. One can use Diffusion Forcing's unique properties for a wide range of applications across video diffusion, planning and robotics. Further, we present history guidance, a technique uniquely enabled by diffusion forcing that significantly enhances video diffusion's consistency, allowing one to roll out extremely long videos that was previously impossible.

Bio.

Boyuan Chen is a PhD student at MIT working with Prof Vincent Sitzmann and Prof Russ Tedrake. He is interested in model-based reinforcement learning, generative world model and their applications in embodied intelligence. Boyuan hopes to leverage video world models trained on internet-scale data as planners for general-purpose robots, replicating LLM's success but for the visual world. Previously, Boyuan also interned at Google Deepmind and Google X, working on equipping Google's foundation models with spatial reasoning capabilities.

Add to Calendar 2025-02-05 15:00:00 2025-02-05 16:00:00 America/New_York [Scale ML] Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion Speaker: Boyuan ChenTopic: Diffusion Forcing: Next-Token Prediction Meets Full-Sequence DiffusionDate: Wednesday, Feb 5Time: 3:00 PM (EST)Zoom: https://mit.zoom.us/j/91697262920 (password: mitmlscale) Abstract. Diffusion Forcing is a new sequence diffusion paradigm that combines the power of full-sequence diffusion models (like SORA) and next-token models (like LLMs), acting as either or a mix at sampling time without retraining. One can use Diffusion Forcing's unique properties for a wide range of applications across video diffusion, planning and robotics. Further, we present history guidance, a technique uniquely enabled by diffusion forcing that significantly enhances video diffusion's consistency, allowing one to roll out extremely long videos that was previously impossible. Bio. Boyuan Chen is a PhD student at MIT working with Prof Vincent Sitzmann and Prof Russ Tedrake. He is interested in model-based reinforcement learning, generative world model and their applications in embodied intelligence. Boyuan hopes to leverage video world models trained on internet-scale data as planners for general-purpose robots, replicating LLM's success but for the visual world. Previously, Boyuan also interned at Google Deepmind and Google X, working on equipping Google's foundation models with spatial reasoning capabilities. TBD

[Scale ML] Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

Speaker

Host

February 05 2025

Location