Thesis Defense: Computational Perception for Multimodal Document Understanding

Speaker

MIT CSAIL

Host

Fredo Durand & Aude Oliva

MIT CSAIL

Multimodal documents occur in a variety of forms, as graphs in technical reports, diagrams in textbooks, and graphic designs in bulletins. Humans can efficiently parse the visual and textual information contained within to make decisions on topics including business, healthcare, and science. Building the computational tools to understand multimodal documents can therefore have important applications for web search, information retrieval, automatic captioning, and design tools. In this talk, I will discuss our machine learning approaches for detecting and parsing the visual and textual elements in multimodal documents for topic prediction and automatic summarization. Inspired by human perception, I will also present our models that predict where people look in graphic designs and information visualizations, along with the interactive design applications that they enable. The work in this thesis makes contributions to the fields of human vision, computer vision, and human-computer interaction.

Committee: Fredo Durand, Aude Oliva, Hanspeter Pfister, Rob Miller, Bill Freeman

Add to Calendar 2018-05-01 10:00:00 2018-05-01 11:00:00 America/New_York Thesis Defense: Computational Perception for Multimodal Document Understanding Multimodal documents occur in a variety of forms, as graphs in technical reports, diagrams in textbooks, and graphic designs in bulletins. Humans can efficiently parse the visual and textual information contained within to make decisions on topics including business, healthcare, and science. Building the computational tools to understand multimodal documents can therefore have important applications for web search, information retrieval, automatic captioning, and design tools. In this talk, I will discuss our machine learning approaches for detecting and parsing the visual and textual elements in multimodal documents for topic prediction and automatic summarization. Inspired by human perception, I will also present our models that predict where people look in graphic designs and information visualizations, along with the interactive design applications that they enable. The work in this thesis makes contributions to the fields of human vision, computer vision, and human-computer interaction.Committee: Fredo Durand, Aude Oliva, Hanspeter Pfister, Rob Miller, Bill Freeman Allen&Haus RLE seminar rooms (36-462 & 36-428)

Organizer & Contact

Zoya Bylinskii

zoya@csail.mit.edu

Thesis Defense: Computational Perception for Multimodal Document Understanding

Speaker

Host

May 01 2018

Location

Organizer & Contact