Drift-Free Visual Tracking and Egomotion

In collaboration with LP Morency and Trevor Darrell

Many visual trackers lose track of their target over time. To prevent this from happening, our trackers determine the pose of the object by matches each image of the object to an object model by trying various transformations on the model. The transformation that produces the best match with the image is reported as the pose of the object. The model is refined online as tracking goes on. Our contribution is the use of object models that consist of a collection of pose-annotated keyframes.

We have built several trackers based on this principle: a head tracker and various egomotion estimators. Our head tracker relies on stereo cameras from Videre Design. The head tracker runs at about 14 frames per second, and is accurate to within a few degrees of rotation and a few millimiters of translation when the subject is 1-2 meters away from the camera.

A few frames (frame number shown) of our tracker. The 6 degrees of freedom recovered are used to overlay a cube on the subject's head.
The rotations recovered by our visual tracker is close to those recovered by an inertial sensor mounted on the head of a subject. From top to bottom: rotations in the X, Y, and Z axes. Estimated rotations are accurate to within the error tolerance of the inertial sensor (<3 degrees).

An example video (7.5MB). The number of dots under the box represent the number of keyframes used to estimate the pose of that frame. See paper for more detail. See another video (11.4MB). This illustrative video (46 MB) shows how the pose of keyframes (squares on the grid) are refined over time.

Apart from segmentation, there is not much difference between tracking an object using a stationary camera and tracking the position of a moving camera moving through a stationary scene. We have used view-based appearance models for estimation camera motion as well.

The following papers describe these trackers in detail.

  • Reducing Drift in Differential Tracking, A. Rahimi, L-P Morency, T. Darrell, in Computer Vision and Image Understanding (CVIU) 2006. (pdf)
  • Adaptive View-Based Appearance Models, L-P Morency, A. Rahimi, T. Darrell, in CVPR 2003. (pdf)
  • Location Estimation with a Differential Update Network, A. Rahimi, T. Darrell, in NIPS, 2002. (pdf)
  • Reducing Drift in Parametric Motion Tracking, A. Rahimi, L-P Morency, T. Darrell, in ICCV2001, 2001 (pdf).

    The head tracker is now maintained by LP Morency. Louis-Philippe has enhanced the original registration algorithm by recoding it in C++ (to make it real-time) and combining it with the ICP 3D registration algortihm to improve accuracy. These are the original registration algorithm and LP's enhnancement:

  • 3D pose tracking with linear depth and brightness constraints, M. Harville, A. Rahimi, T. Darrell, G. Gordon, J. Woodfill, in ICCV, vol. 1, pp.206-213, Sep 1999.
  • Stereo Tracking using ICP and Normal Flow Constraint, L-P Morency, T. Darrell, ICPR 2002. We have used the tracker for various human-computer interactions.
  • Fast Stereo-Based Head Tracking for Interactive Environment, L-P Morency, A. Rahimi, N. Checka, T. Darrell, in Proceedings of the Int. Conference on Automatic Face and Gesture Recognition, 2002 (pdf).
  • Face-responsive interfaces: from direct manipulation to perceptive presence, D. Demirdjian, K. Tollmar, F. Bentley, N. Checka, L-P Morency, A. Rahimi, A. Oh, T. Darrell, in UBICOMP'02, 2002. See LP's page for examples of the tracker being used to obtain feedback for interactive dialog systems.