Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators

Speaker

OpenAI

Host

Yang Liu

MIT EECS & CSAIL

Virtual session of MIT Visual Computing Seminar, Spring 2024 featuring invited speaker (remote) Tim Brooks from OpenAI.

The format is ~25 min of talk followed by Q&A. Considering the potential capacity of the talk, we use slido for live Q&A and answer top questions from the upvote queue. [live Q&A link] https://tinyurl.com/TimBrooksMIT

Please DO NOT record this talk by any means. Thanks for your understanding.

Title
Sora: Video Generation Models as World Simulators

Abstract
We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Bio
Tim Brooks is a research scientist at OpenAI where he co-leads Sora, their video generation model. His research investigates large-scale generative models that simulate the physical world. Tim received a PhD at Berkeley AI Research advised by Alyosha Efros, where he invented InstructPix2Pix. He previously worked on AI that powers the Pixel phone's camera at Google and on video generation models at NVIDIA.

Add to Calendar 2024-04-23 12:00:00 2024-04-23 13:00:00 America/New_York Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators Virtual session of MIT Visual Computing Seminar, Spring 2024 featuring invited speaker (remote) Tim Brooks from OpenAI.The format is ~25 min of talk followed by Q&A. Considering the potential capacity of the talk, we use slido for live Q&A and answer top questions from the upvote queue. [live Q&A link] https://tinyurl.com/TimBrooksMITPlease DO NOT record this talk by any means. Thanks for your understanding. TitleSora: Video Generation Models as World SimulatorsAbstractWe explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.Bio Tim Brooks is a research scientist at OpenAI where he co-leads Sora, their video generation model. His research investigates large-scale generative models that simulate the physical world. Tim received a PhD at Berkeley AI Research advised by Alyosha Efros, where he invented InstructPix2Pix. He previously worked on AI that powers the Pixel phone's camera at Google and on video generation models at NVIDIA. https://mit.zoom.us/j/95167636032?pwd=U0dyaEx1a3A3QkZrbmIvMkcvUFkyUT09 (password: mitvc)

Organizer & Contact

Yang Liu

yliu@csail.mit.edu

Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators

Speaker

Host

April 23 2024

Location

Organizer & Contact

May 12

Thesis Defense: Scaling Cooperative Intelligence via Inverse Planning and Probabilistic Programming

May 08

Automatic Integration and Differentiation of Probabilistic Programs

Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators

Speaker

Host

April 23 2024

Location

Organizer & Contact

Related Events

May 12

Thesis Defense: Scaling Cooperative Intelligence via Inverse Planning and Probabilistic Programming

May 08

Automatic Integration and Differentiation of Probabilistic Programs