CSAIL Event Calendar: Previous Series
Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic
Speaker: Jeffrey Mark Siskind , NEC Research Institute, Inc.
In this talk, I will present an implemented system, called Leonard, that classifies simple spatial motion events, such as `pick up' and `put down', from video input. Unlike previous systems that classify events based on their motion profile, Leonard uses changes in the state of force-dynamic relations, such as support, contact, and attachment, to distinguish between event types. Since force-dynamic relations are not visible, Leonard must construct interpretations of its visual input that are consistent with a physical theory of the world. Leonard models the physics of the world via kinematic stability analysis and performs model reconstruction via prioritized circumscription over this analysis. In this talk, I will present an overview of the entire system, along with the details of both the model reconstruction process and the subsequent event-logic inference algorithm that can infer occurrences of compound events from occurrences of primitive events. This inference algorithm uses a novel representation, called spanning intervals, to give a concise representation of the large interval sets that occur when representing liquid and semi-liquid events. I will illustrate how Leonard handles a variety of complex visual-input scenarios that cannot be handled by approaches that are based on motion profile, including extraneous object in the field of view, sequential and simultaneous event occurrences, and non-occurrence of events. I will also present a live example illustrating the end-to-end performance of Leonard classifying an event from video input.