An Analysis and Some Solutions to Challenges in NMT

Speaker

Marc'Aurelio Ranzato

Facebook AI Research Lab

Host

Regina Barzilay

MIT CSAIL

Abstract
There are several well-known challenges in Neural Machine Translation (NMT), such as the performance degradation of larger beams, the under-estimation of rare words, and, perhaps more importantly, the need of large amounts of parallel sentences for training. In this talk, I will first relate some of these challenges to uncertainty in the data, which is due not only to the existence of several semantically equivalent translations of the same source sentence, but also to noise in automatic data collection processes. I will discuss how such uncertainty affects search and to which extent the model distribution matches the data distribution. As part of this analysis, I will introduce simple solutions to some of the above challenges, and conclude with a method for learning to translate by leveraging only monolingual data.

BIO
Marc'Aurelio Ranzato is a Research Scientist at the Facebook AI Research lab in New York City. His research interests are in the area of unsupervised learning, continual learning and transfer learning, with applications to vision, natural language understanding and speech recognition. Marc'Aurelio has earned a PhD in Computer Science at New York University under Yann LeCun's supervision. After a post-doc with Geoffrey Hinton at University of Toronto, he joined the Google Brain team in 2011. In 2013 he joined Facebook and was a founding member of the Facebook AI Research lab.

Add to Calendar 2018-03-02 13:00:00 2018-03-02 14:00:00 America/New_York An Analysis and Some Solutions to Challenges in NMT AbstractThere are several well-known challenges in Neural Machine Translation (NMT), such as the performance degradation of larger beams, the under-estimation of rare words, and, perhaps more importantly, the need of large amounts of parallel sentences for training. In this talk, I will first relate some of these challenges to uncertainty in the data, which is due not only to the existence of several semantically equivalent translations of the same source sentence, but also to noise in automatic data collection processes. I will discuss how such uncertainty affects search and to which extent the model distribution matches the data distribution. As part of this analysis, I will introduce simple solutions to some of the above challenges, and conclude with a method for learning to translate by leveraging only monolingual data.BIOMarc'Aurelio Ranzato is a Research Scientist at the Facebook AI Research lab in New York City. His research interests are in the area of unsupervised learning, continual learning and transfer learning, with applications to vision, natural language understanding and speech recognition. Marc'Aurelio has earned a PhD in Computer Science at New York University under Yann LeCun's supervision. After a post-doc with Geoffrey Hinton at University of Toronto, he joined the Google Brain team in 2011. In 2013 he joined Facebook and was a founding member of the Facebook AI Research lab. 32-G449 (Stata Center - Patil/Kiva Conference Room)

Organizer & Contact

Marcia G. Davidson

marcia@csail.mit.edu

617-253-3049

An Analysis and Some Solutions to Challenges in NMT

Speaker

Host

March 02 2018

Location

Organizer & Contact