We aim to learn language by distant supervision through captioned videos, similarly to how children learn language through interacting with the world around.
We investigate language in different contexts: from how it is learned, to how it is grounded in visual perception, all the way to how machines can readily interact with humans.
Despite what you might see in movies, today’s robots are still very limited in what they can do. They can be great for many repetitive tasks, but their inability to understand the nuances of human language makes them mostly useless for more complicated requests.