Wei-Ning Hsu is a third year PhD student in Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, working with Dr. James Glass in Spoken Language System Group. His research focuses on unsupervised learning of interpretable representations with neural generative models from sequential data, primarily but not limited to speech. Centered on this theme, he also explores applications to robust automatic speech recognition, unsupervised domain adaptation, speaker verification, voice conversion, audio de-noising, and acoustic unit discovery. 

Previously, he has also worked on developing deep recurrent neural networks for speech recognition, attention-based sequence-to-sequence models for community question answering, deep clustering models for speech separation, unsupervised spoken term detection, and active learning. He received B.S. in Electrical Engineering from National Taiwan University in 2014. As an undergrad researcher, he worked with Prof. Lin-Shan Lee in Speech Processing Lab, and Prof. Hsuan-Tien Lin in Computational Learning Lab.

Research Areas

Impact Areas




Automatic Speech Recognition

Automatic speech recognition (ASR) has been a grand challenge machine learning problem for decades. Our ongoing research in this area examines the use of deep learning models for distant and noisy recording conditions, multilingual, and low-resource scenarios.


Unsupervised Learning of Interpretable Representations from Sequential Data

Generation of sequential data involves multiple factors operating at different temporal scales. Take natural speech for example, the speaker identity tends to be consistent within an utterance, while the phonetic content changes from frame to frame. By explicitly modeling such hierarchical generative process under a probabilistic framework, we proposed a model that learns to factorizes sequence-level factors and sub-sequence-level factors into different sets of representations without any supervision.