CSAIL Event Calendar: Previous Series

Dynamic Bayesian Network Models of Pronunciation for Speech Recognition

Speaker: Karen Livescu , Spoken Language Systems, CSAIL
Date: March 10 2004
Time: 4:10PM
Location: NE43-941
Contact: Lilla Zollei, Louis-Philippe Morency, 3-2986, 3-4278, lzollei@ai.mit.edu, lmorency@ai.mit.edu
Relevant URL:

Casual speech is characterized by extremely variable pronunciations. For example, a short word such as "about" can have tens of distinct pronunciations. This poses a difficult problem for automatic speech recognition systems. Typical speech recognizers represent word pronunciations as strings of "phones", or basic speech sounds. However, many types of pronunciation variation are more elegantly and compactly represented in terms of changes in sub-phonetic "features" such as the positions of the speech articulators (lips, tongue, etc.). We investigate an approach to pronunciation modeling in which the evolution of such features is explicitly represented. A natural framework for such a model is that of dynamic Bayesian networks (DBNs), which allow us to efficiently implement factored state representations.

In this talk, I will give some examples of usual and unusual pronunciations to motivate the feature-based approach. I will describe the type of DBN we have used in our model, and present recent experimental results.

See other events that are part of CSAIL Student Seminar - Spring 2004

See other events happening in March 2004


About Us Research News Resources Directory