EECS Special Seminar: Hao Liu, "Towards a Machine Capable of Learning Everything"

Speaker

Hao Liu
EECS Department - UC Berkeley

Host

Leslie Kaelbling
Abstract:
Large generative models such as ChatGPT have led to amazing results and revolutionized artificial intelligence. In this talk, I will discuss my research on advancing the foundation of these models, centered around addressing the architectural bottlenecks of learning from everything. First, I will describe our efforts to remove context size limitations of the transformer architecture. Our new model architecture and training method allow for nearly infinitely large context sizes without approximations. Our proposed technique has been used for building state-of-the-art open-source and proprietary models. I will then discuss the applications of large context in world model learning and in reinforcement learning, including Large World Model, the world's first multimodal model of million-length scale, and the required training methodologies. Next, I will introduce my research on unsupervised exploration that pioneered learning beyond existing knowledge, allowing unsupervised pretrained models to outperform human experts in gameplay and paving the road for learning beyond imitating existing knowledge. Finally, I will envision the modeling and training paradigms for the next generation of large generative models we should build, focusing on advances in neural net architecture, efficient scaling, large context reasoning, and discovery.

Bio:
Hao Liu is a final-year Ph.D. candidate in the Department of Electrical Engineering and Computer Sciences at UC Berkeley, where he is advised by Pieter Abbeel. During his PhD, he has also spent two years part-time at Google Brain and DeepMind. His research interests focus on the foundations of generative models, including machine learning and neural networks, with the goal of developing computationally scalable solutions for generalization. He recently developed Large World Model (LWM) and architectural advances (BlockwiseTransformers, and RingAttention) for scaling transformers. Earlier, he pioneered general and scalable unsupervised exploration (APT and APS). His work on million-length contexts has been influential at Google, Meta, and the broader industry. Several of his papers have been presented as spotlight and oral presentations at top-tier machine learning conferences, and have also been featured in popular media, including MarkTechPost, Business Insider, and ZDNet.