Project
Crossing the Vision-Language Boundary

As infants, we acquire spoken language by listening to those around us, almost as if by osmosis. Of course, we aren't just listening, but looking as well; what we expect to hear depends on what we see, and what we expect to see is influenced by what we hear. We are building unsupervised machine learning models that not only learn to recognize words within speech signals and objects within visual images, but also learn to associate these patterns with one another. Using these models, we are able to perform tasks such as semantic image retrieval with spoken queries, and to uncover translations of words across languages by using the visual space as an interlingua.
Contact us
If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.
Last updatedDec 04 '17