I hold a B.S. in electrical and computer engineering at the University of Illinois at Urbana-Champaign, a S.M. in computer science from MIT, and a Ph.D. in computer science from MIT.
My research focus is broadly in the area of speech and language processing. In the past, I have worked on automatic language identification, latent topic modeling, text summarization, pronunciation modeling for speech recognition, audio-visual speech recognition, and unsupervised speech pattern discovery.
The current focus of my research work is multimodal perception. Human babies possess a unique and incredible ability to learn language by simply being immersed in the world. They learn to utilize spoken language to communicate their feelings, needs, perceptions, and the state of the world to their caretakers and peers. This language is inescapably grounded in the real world, and thus tightly coupled to other sensory modalities such as vision, touch, smell, etc.
I am interested in designing unsupervised learning algorithms that can acquire language and learn to perceive the world in a similarly organic way, without necessarily mimicking the mechanisms by which humans do so. I believe that the cross-modal correspondences that exist in the real world can be leveraged to guide this learning, acting as a surrogate for the expert annotations upon which conventional machine learning models rely.