Co-Adaptation of Audio-Visual Speech and Gesture Recognizers

Speaker: C. Mario Christoudias , MIT CSAIL
Date: October 23 2006
Time: 4:00PM to 5:00PM
Location: 32-G449
Host: Daniel Myers, MIT CSAIL
Contact: Daniel Myers, dsmyers@mit.edu
Relevant URL: Co-Adaptation of Audio-Visual Speech and Gesture Recognizers
The construction of robust multimodal interfaces often requires large amounts
of labeled training data to account for cross-user differences and variation in
the environment. In this work, we investigate whether unlabeled training data
can be leveraged to build more reliable audio-visual classifiers through
co-training, a multi-view learning algorithm. Multimodal tasks are good
candidates for multi-view learning, since each modality provides a potentially
redundant view to the learning algorithm. We apply co-training to two problems:
audio-visual speech unit classification, and user agreement recognition using
spoken utterances and head gestures. We demonstrate that multimodal co-training
can be used to learn from only a few labeled examples in one or both of the
audio-visual modalities. We also propose a co-adaptation algorithm, which
adapts existing audio-visual classifiers to a particular user or noise
condition by leveraging the redundancy in the unlabeled data.
See other events that are part of CSAIL Student Seminar Series Fall 2006
See other events happening in October 2006