CSAIL Event Calendar


Distributed Hessian-free Optimization for Deep Neural Network Acoustic Models

Speaker: Brian Kingsbury, IBM T. J. Watson Research Center
Date: Monday, November 5 2012
Time: 4:00PM to 5:00PM
Refreshments: 3:45PM
Location: 32-G882 (Stata Center - Hewlett Room)
Host: Jim Glass, MIT CSAIL
Contact: Marcia Davidson, 617-253-3049, marcia@csail.mit.edu

Neural network acoustic models have Recently enjoyed a renaissance, with multiple research groups finding that they outperform state-of-the-art Gaussian mixture model (GMM) acoustic models on a wide variety of tasks. Three architectural factors distinguish modern neural network acoustic models from previous such models: (1) they are deep, typically using five or more hidden layers; (2) they are wide, using thousands of units per hidden layer; and (3) they classify audio features into thousands of context-dependent HMM state targets. Together, these factors mean that neural network acoustic models have a large number of parameters, typically on the order of tens of millions. Additionally, these models may be trained using sequence-discriminative criteria such as maximum mutual information or minimum Bayes risk. The result is that training such models using standard stochastic gradient descent is a slow process, requiring weeks of compute time. In contrast, standard GMM acoustic models can be trained in a few days, thanks to parallelization on large compute clusters. In this talk, I will describe a distributed neural network training algorithm, based on Hessian-free optimization (Martens, ICML 2010), that can take advantage of large compute clusters to scale to deep networks and large data sets. Using examples from broadcast news and Switchboard transcription, and Babel Cantonese transcription and keyword search, I will show how Hessian-free optimization can reduce training times and improve system performance.

Brian Kingsbury is a research staff member at the IBM T. J. Watson Research Center. He joined IBM Research in 1999 after completing his PhD at the University of California, Berkeley. He is co-PI and technical lead for the LORELEI consortium, an IBM-led group working on the IARPA Babel program. Brian is currently an associate editor for IEEE Transactions on Audio, Speech, and Language Processing. He was a member of the Speech and Language Technical Committee of the IEEE Signal Processing Society from 2009 to 2011, and he served as a speech area chair for the 2010, 2011, and 2012 ICASSP conferences. His research interests include deep neural network acoustic modeling, large-vocabulary speech transcription, and keyword search in audio.

See other events happening in November 2012


About Us Research News Resources Directory