Thesis Defense: Computational Perception for Multimodal Document Understanding
Host
Fredo Durand & Aude Oliva
MIT CSAIL
Multimodal documents occur in a variety of forms, as graphs in technical reports, diagrams in textbooks, and graphic designs in bulletins. Humans can efficiently parse the visual and textual information contained within to make decisions on topics including business, healthcare, and science. Building the computational tools to understand multimodal documents can therefore have important applications for web search, information retrieval, automatic captioning, and design tools. In this talk, I will discuss our machine learning approaches for detecting and parsing the visual and textual elements in multimodal documents for topic prediction and automatic summarization. Inspired by human perception, I will also present our models that predict where people look in graphic designs and information visualizations, along with the interactive design applications that they enable. The work in this thesis makes contributions to the fields of human vision, computer vision, and human-computer interaction.
Committee: Fredo Durand, Aude Oliva, Hanspeter Pfister, Rob Miller, Bill Freeman
Committee: Fredo Durand, Aude Oliva, Hanspeter Pfister, Rob Miller, Bill Freeman