Project

Unsupervised Learning of Interpretable Representations from Sequential Data

Generation of sequential data involves multiple factors operating at different temporal scales. Take natural speech for example, the speaker identity tends to be consistent within an utterance, while the phonetic content changes from frame to frame. By explicitly modeling such hierarchical generative process under a probabilistic framework, we proposed a model that learns to factorizes sequence-level factors and sub-sequence-level factors into different sets of representations without any supervision.

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.

Group

Spoken Language Systems Group

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Nov 30 '17

Research Areas

AI & ML

Impact Areas

Big Data

Members

Jim Glass

Publications

Hsu, Wei-Ning and Zhang, Yu and Glass, James

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

NIPS 2017

View publication

Hsu, Wei-Ning and Zhang, Yu and Glass, James

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

ASRU 2017

View publication

Hsu, Wei-Ning and Zhang, Yu and Glass, James

Learning Latent Representations for Speech Generation and Transformation

Interspeech 2017

View publication