EI Seminar - Octavia Camps - Frugal, Interpretable, Dynamics-Inspired Architectures for Sequence Analysis

Speaker

Northeastern University

Host

Terry Suh
CSAIL
Title: Frugal, Interpretable, Dynamics-Inspired Architectures for Sequence Analysis

Speaker: Octavia Camps



Abstract:

One of the long-term objectives of Machine Learning is to endow machines with the capacity of structuring and interpreting the world as we do. This is particularly challenging in scenes involving time series, such as video sequences, since seemingly different data can correspond to the same underlying dynamics.

In this talk, I will discuss how we can leverage concepts from dynamical systems theory to design frugal, and interpretable architectures for sequence analysis, classification, prediction, and manipulation. I will illustrate these ideas with two examples.

Firstly, I will show how we can incorporate view invariance while designing computer vision architectures for cross-view action recognition. The central theme of this approach is the use of dynamical models, and their associated invariants, as an information-autoencoding unsupervised learning paradigm. This framework is flexible and can be used with different types of input modalities: RGB, 3D Skeletons, or both. Comparisons against the current state of the art methods, using four widely used benchmark datasets, show that this approach achieves state of the art in all input modalities and that it significantly closes the performance gap between RGB and 3D skeleton-based approaches.

In the second example, I will introduce a framework inspired by recent results in non-linear systems identification, capable of decomposing a video into its moving objects, their attributes, and the dynamic modes of their trajectories. The framework captures the dynamic information as the eigenvalues and eigenvectors of a Koopman operator, which provide an interpretable and parsimonious representation. This decomposition can then be used to perform video analytics, predict future frames, and generate and manipulate synthetic videos.

Bio: Octavia Camps received a B.S. degree in computer science and a B.S. degree in electrical engineering from the Universidad de la Republica (Uruguay), and a M.S. and a Ph.D. degree in electrical engineering from the University of Washington. Since 2006, she has been a Professor in the Electrical and Computer Engineering Department at Northeastern University. From 1991 to 2006 she was a faculty of Electrical Engineering and of Computer Science and Engineering at The Pennsylvania State University. Prof. Camps was a visiting researcher at the Computer Science Department at Boston University during Spring 2013 and in 2000, she was a visiting faculty at the California Institute of Technology and at the University of Southern California. She is an associate editor IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a General Chair for IEEE/CVF Computer Vision and Pattern Recognition (CVPR) 2024. Her main research interests include dynamics-based computer vision, machine learning, and image processing. In particular, her work seeks data-driven dynamic representations for high dimensional temporal sequences, which are compact, physically meaningful, and capture causality relationships. Combining recent deep learning developments with concepts from dynamic systems identification, she has developed models and algorithms for a range of video analytics applications, including human re-identification, visual tracking, action recognition, video generation, and medical imaging.