A Variational Approach to Object-Centric Image Models and World Models

Speaker

Technion

Host

Pulkit Agrawal
Abstract:
Unsupervised latent variable models serve as highly effective tools for representing complex data such as images or world models, relevant for applications such as robotic manipulation, video generation, novelty detection, and many more. Variational Autoencoders (VAEs) provide compact latent representations with stability and efficiency. In this talk, we will explore modern VAEs that mitigate shortcomings of classical approaches such as blurry images, and can be used as a basis for strong world models. The first paper (CVPR 2021 Oral) introduces "Soft-IntroVAE" , a refined approach to introspective variational autoencoders, enhancing training stability and theoretical insights while showcasing its applications. The second paper (ICML 2022) presents "Deep Latent Particles (DLP)" for unsupervised image representation learning, offering disentangled object features, uncertainty estimation, and versatile applications. Building on DLP, the third paper unveils "DDLP," a novel object-centric video prediction, manipulation and generation algorithm with efficiency, interpretability, and state-of-the-art results.

About Tal:
Tal (https://taldatech.github.io) is a third-year Ph.D. student in the Electrical and Computer Engineering faculty at the Technion, where he earned his B.Sc. and M.Sc., under the supervision of Prof. Aviv Tamar. His research interests include unsupervised representation learning, generative modeling and reinforcement learning.