Automatic speech recognition (ASR) has been a grand challenge machine learning problem for decades. Our ongoing research in this area examines the use of deep learning models for distant and noisy recording conditions, multilingual, and low-resource scenarios.
All humans process vast quantities of unannotated speech and manage to learn phonetic inventories, word boundaries, etc., and can use these abilities to acquire new word. Why can't ASR technology have similar capabilities? Our goal in this research project is to build speech technology using unannotated speech corpora.
Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.
Hyper-connectivity has changed the way we communicate, wait, and productively use our time. Even in a world of 5G wireless and “instant” messaging, there are countless moments throughout the day when we’re waiting for messages, texts, and Snapchats to refresh. But our frustrations with waiting a few extra seconds for our emails to push through doesn’t mean we have to simply stand by.
Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.In the 2015 volume of Transactions of the Association for Computational Linguistics, CSAIL researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.