“Efficient Visual Search and Learning”

Speaker: Kristen Grauman , University of Texas at Austin
Date: March 2 2009
Time: 4:00PM to 5:00PM
Location: 32-G449
Host: Fredo Durand, MIT
Contact: Francis Doughty, 253-4602, doughty@mit.edu
ABSTRACT
Image and video data are rich with meaning, memories, or entertainment, and they can even facilitate communication or scientific discovery. However, our ability to capture and store massive amounts of interesting visual data has outpaced our ability to analyze it. Methods to search and organize images directly based on their visual cues are thus necessary to make them fully accessible. Unfortunately, the complexity of the problem often leads to approaches that will not scale: conventional methods rely on substantial manually annotated training data, or have such high computational costs that the representation or data sources must be artificially restricted. In this talk I will present our work addressing scalable image search and recognition. I will focus on our techniques for fast image matching and retrieval, and introduce an active learning strategy that minimizes the annotations that a human supervisor must provide to produce accurate models.
While generic distance functions are often used to compare image features, we can use a sparse set of similarity constraints to learn metrics that better reflect their underlying relationships. To allow sub-linear time similarity search under the learned metrics, we show how to encode the metric parameterization into randomized locality-sensitive hash functions. Our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases. In order to best leverage manual intervention, we show how the system itself can actively choose its desired annotations. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). I will provide results illustrating how these efficient strategies will enable a new class of applications that rely on the analysis of large-scale visual data, such as object recognition, activity discovery, or meta-data labeling.
Bio: Kristen Grauman is a Clare Boothe Luce Assistant Professor in the Department of Computer Sciences at the University of Texas at Austin. Before joining UT-Austin in 2007, she received the Ph.D. and S.M. degrees from the MIT Computer Science and Artificial Intelligence Laboratory. Her research in computer vision and machine learning focuses on visual search and recognition. She is a Microsoft Research New Faculty Fellow, and a recipient of an NSF CAREER award and the Frederick A. Howes Scholar Award in Computational Science.
See other events that are part of CS Special Seminar Series Spring 2009
See other events happening in March 2009