Self-driving cars are likely to be safer, on average, than human-driven cars. But they may fail in new and catastrophic ways that a human driver could prevent. This project is designing a new architecture for a highly dependable self-driving car.
Automatic speech recognition (ASR) has been a grand challenge machine learning problem for decades. Our ongoing research in this area examines the use of deep learning models for distant and noisy recording conditions, multilingual, and low-resource scenarios.
All humans process vast quantities of unannotated speech and manage to learn phonetic inventories, word boundaries, etc., and can use these abilities to acquire new word. Why can't ASR technology have similar capabilities? Our goal in this research project is to build speech technology using unannotated speech corpora.
For all the progress made in self-driving technologies, there still aren’t many places where they can actually drive. Companies like Google only test their fleets in major cities where they’ve spent countless hours meticulously labeling the exact 3-D positions of lanes, curbs, off-ramps, and stop signs.
Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.
Light lets us see the things that surround us, but what if we could also use it to see things hidden around corners? It sounds like science fiction, but that’s the idea behind a new algorithm out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) — and its discovery has implications for everything from emergency response to self-driving cars.
In recent years, a host of Hollywood blockbusters — including “The Fast and the Furious 7,” “Jurassic World,” and “The Wolf of Wall Street” — have included aerial tracking shots provided by drone helicopters outfitted with cameras. Those shots required separate operators for the drones and the cameras, and careful planning to avoid collisions. But a team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and ETH Zurich hope to make drone cinematography more accessible, simple, and reliable.
Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.
Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.In the 2015 volume of Transactions of the Association for Computational Linguistics, CSAIL researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.