CSAIL Event Calendar
Distributed Hessian-free Optimization for Deep Neural Network Acoustic ModelsSpeaker: Brian Kingsbury, IBM T. J. Watson Research Center Date: Monday, November 5 2012 Time: 4:00PM to 5:00PM Refreshments: 3:45PM Location: 32-G882 (Stata Center - Hewlett Room) Host: Jim Glass, MIT CSAIL Contact: Marcia Davidson, 617-253-3049, marcia@csail.mit.edu Neural network acoustic models have Recently enjoyed a renaissance, with multiple research groups finding that they outperform state-of-the-art Gaussian mixture model (GMM) acoustic models on a wide variety of tasks. Three architectural factors distinguish modern neural network acoustic models from previous such models: (1) they are deep, typically using five or more hidden layers; (2) they are wide, using thousands of units per hidden layer; and (3) they classify audio features into thousands of context-dependent HMM state targets. Together, these factors mean that neural network acoustic models have a large number of parameters, typically on the order of tens of millions. Additionally, these models may be trained using sequence-discriminative criteria such as maximum mutual information or minimum Bayes risk. The result is that training such models using standard stochastic gradient descent is a slow process, requiring weeks of compute time. In contrast, standard GMM acoustic models can be trained in a few days, thanks to parallelization on large compute clusters. In this talk, I will describe a distributed neural network training algorithm, based on Hessian-free optimization (Martens, ICML 2010), that can take advantage of large compute clusters to scale to deep networks and large data sets. Using examples from broadcast news and Switchboard transcription, and Babel Cantonese transcription and keyword search, I will show how Hessian-free optimization can reduce training times and improve system performance.
|







