Project

Designing Information-Rich Embedding Spaces

Clustering of shared MNIST/TIDIGTS embedding space

Using regularization techniques, we limit the amount of information encoded in latent embeddings, creating cleaner embeddings which better align with the latent variables we are modelling.

Ever since Mikolov et al introduced Word2Vec in 2013, there has been a desire to make models which map data into an information-rich embedding spaces. However, with noisy data or data from various channels, the learned embedding space often captures information irrelevant to the latent concepts we are trying to model (whether that be semantic content, speaker information, etc.). For the purpose of latent modelling, this extra information can be considered noise. We explore the use of regularization techniques to limit the amount of information encoded in the embedding. When combined with loss terms that prioritizes useful information, we have been able to learn embedding spaces which correlate better to the latent variables we are trying to model. Moreover, these embedding spaces can be learned in a weakly supervised fashion where the only supervisory signal comes in the form of paired sensory inputs.

Group

Spoken Language Systems Group

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Nov 19 '17

Research Areas

AI & ML

Impact Areas

Big Data

Project

Designing Information-Rich Embedding Spaces

Group

Contact us

Research Areas

Impact Areas

Group

Members

Jim Glass