Dynamic Bayesian Network Models of Pronunciation for Speech Recognition
Speaker: Karen Livescu , Spoken Language Systems, CSAIL
Casual speech is characterized by extremely variable pronunciations. For example, a short word such as "about" can have tens of distinct pronunciations. This poses a difficult problem for automatic speech recognition systems. Typical speech recognizers represent word pronunciations as strings of "phones", or basic speech sounds. However, many types of pronunciation variation are more elegantly and compactly represented in terms of changes in sub-phonetic "features" such as the positions of the speech articulators (lips, tongue, etc.). We investigate an approach to pronunciation modeling in which the evolution of such features is explicitly represented. A natural framework for such a model is that of dynamic Bayesian networks (DBNs), which allow us to efficiently implement factored state representations.