We are developing machine-learning technology to help users efficiently run data queries over large archives of raw video.

As video is captured in increasingly large volumes, new types of software are needed in order for users to query and extract relevant information in a fast, efficient way. Traditionally, query processing has been applied to databases in which SQL queries are run to extract simple information from tables of data. Simple information, such as finding the number of vehicles that run red traffic lights, can also be extracted from raw video data by running queries over large archives of video — but until recently, it has been computationally expensive and resource-intensive to run computer-vision algorithms that can process videos.

The technology that we have developed is capable of applying machine-learning algorithms that can efficiently identify objects within video streams. Our software implements both object-tracking and query processing for automated analysis of video.

To speed up queries while maintaining accuracy, our approach involves processing video at low framerates when possible. Rather than running algorithms on every frame of video to answer simple questions, we focus on regions of video so that the algorithms prune out the frames that will not satisfy the predicate. These types of pruning heuristics dramatically reduce the number of frames that need to be processed, and a more robust object tracker helps to refine tracks with finer-grained detections at the same time.

We are also looking into a self-supervised learning approach to automate object tracking in video, instead of a manual hand-labeling to track every instance of an object type (such as cars, pedestrians, or animals). Through these methods and other optimization tools, our project aims to reduce weak labeling and self-occlusion, as well as accelerate the execution of object track queries with novel software.

Our goal is to create a self-consistent model across varying inputs — for example, training one model to track objects based on spatial features, and another model using only visual features. We are continuing to develop tools that express different approaches to solving different tasks for use in a variety of applications, such as traffic planning and behavior research in autonomous driving. As our tracking software becomes more sophisticated, we are able to run much more efficient queries over larger volumes of video data.