Abstract: Machine learning algorithms excel primarily in settings where an engineer can first reduce the problem to a particular function (e.g. an image classifier), and then collect a substantial amount of labeled input-output pairs for that function. In drastic contrast, humans are capable of learning from streams of raw sensory data with minimal external instruction. In this talk, I will argue that, in order to build intelligent systems that are as capable as humans, machine learning models should not be trained in the context of one particular application. Instead, we should be designing systems that can be versatile, can learn in unstructured settings without detailed human-provided labels, and can accomplish many tasks, all while processing high-dimensional sensory inputs. To do so, these systems must be able to actively explore and experiment, collecting data themselves rather than relying on detailed human labels.
My talk will focus on two key aspects of this goal: generalization and self-supervision. I will first show how we can move away from hand-designed, task-specific representations of a robot’s environment by enabling the robot to learn high-capacity models, such as deep networks, for representing complex skills from raw pixels. Further, I will present an algorithm that learns deep models that can be rapidly adapted to different objects, new visual concepts, or varying environments, leading to versatile behaviors. Beyond such versatility, a hallmark of human intelligence is self-supervised learning. I will discuss how we can allow a robot to learn by playing with objects in the environment without any human supervision. From this experience, the robot can acquire a visual predictive model of the world that can be used for maneuvering many different objects to varying positions. In all settings, our experiments on simulated and real robot platforms demonstrate the ability to scale to complex, vision-based skills with novel objects.
Bio: Chelsea Finn is a PhD candidate in Computer Science at UC Berkeley, studying machine learning for perception and control of embodied systems. She is interested in how learning algorithms can enable machines to acquire common sense, allowing them to learn a variety of complex sensorimotor skills in real-world settings. During her PhD, she has developed deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Chelsea received her Bachelors degree in Electrical Engineering and Computer Science at MIT. She has also spent time as an intern at Google Brain, working on self-supervised robot learning algorithms using deep predictive models with data from several robot arms. Her research has been recognized through an NSF graduate fellowship, a Facebook fellowship, and the C.V. Ramamoorthy Distinguished Research Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. With Sergey Levine and John Schulman, she also designed and taught a course on deep reinforcement learning, with thousands of followers online. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp and a mentoring program, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
For links to papers, videos, and open-sourced code and data, see: https://people.eecs.berkeley.edu/~cbfinn/