This CoR brings together researchers at CSAIL working across a broad swath of application domains. Within these lie novel and challenging machine learning problems serving science, social science and computer science.
This community is interested in understanding and affecting the interaction between computing systems and society through engineering, computer science and public policy research, education, and public engagement.
Automatic speech recognition (ASR) has been a grand challenge machine learning problem for decades. Our ongoing research in this area examines the use of deep learning models for distant and noisy recording conditions, multilingual, and low-resource scenarios.
We study the fundamentals of Bayesian optimization and develop efficient Bayesian optimization methods for global optimization of expensive black-box functions originated from a range of different applications.
Data often has geometric structure which can enable better inference; this project aims to scale up geometry-aware techniques for use in machine learning settings with lots of data, so that this structure may be utilized in practice.
The MOOC Learner Project provides learning scientists, instructional designers and online education specialists with open source software that enables them to efficiently extract teaching and learning insights from the data collected when students learn on the edX or open edX platform.
Our research seeks to discover best practices for using avatars to enhance performance, engagement, and STEM identity development for diverse public middle and high school computer science students. As sites of our research we run workshops in which students learn computer science in fun, relevant ways, and develop self-images as computer scientists.
All humans process vast quantities of unannotated speech and manage to learn phonetic inventories, word boundaries, etc., and can use these abilities to acquire new word. Why can't ASR technology have similar capabilities? Our goal in this research project is to build speech technology using unannotated speech corpora.
Our goal is to build a system that predicts where people are looking in images. Given an image and the location of a head, our approach follows the gaze of the person and identifies the object being looked at.
Google AI’s Jeff Dean has a seemingly straightforward objective: he wants to use a collection of trainable mathematical units organized in layers to solve complicated tasks that will ultimately benefit many parts of society.
The Imagination, Computation, and Expression Laboratory at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has released a new video game called Grayscale, which is designed to sensitize players to problems of sexism, sexual harassment, and sexual assault in the workplace.
Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.
Artificial intelligence (AI) in the form of “neural networks” are increasingly used in technologies like self-driving cars to be able to see and recognize objects. Such systems could even help with tasks like identifying explosives in airport security lines.
Hyper-connectivity has changed the way we communicate, wait, and productively use our time. Even in a world of 5G wireless and “instant” messaging, there are countless moments throughout the day when we’re waiting for messages, texts, and Snapchats to refresh. But our frustrations with waiting a few extra seconds for our emails to push through doesn’t mean we have to simply stand by.
The butt of jokes as little as 10 years ago, automatic speech recognition is now on the verge of becoming people’s chief means of interacting with their principal computing devices. In anticipation of the age of voice-controlled electronics, MIT researchers have built a low-power chip specialized for automatic speech recognition. Whereas a cellphone running speech-recognition software might require about 1 watt of power, the new chip requires between 0.2 and 10 milliwatts, depending on the number of words it has to recognize.
Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.In the 2015 volume of Transactions of the Association for Computational Linguistics, CSAIL researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.