Our vision is data-driven machine learning systems that advance the quality of healthcare, the understanding of cyber arms races and the delivery of online education.
We combine methods from computer science, neuroscience and cognitive science to explain and model how perception and cognition are realized in human and machine.
We aim to develop the science of autonomy toward a future with robots and AI systems integrated into everyday life, supporting people with cognitive and physical tasks.
We are an interdisciplinary group of researchers blending approaches from human-computer interaction, social computing, databases, information management, and databases.
We investigate language in different contexts: from how it is learned, to how it is grounded in visual perception, all the way to how machines can readily interact with humans.
Led by Web inventor and Director, Tim Berners-Lee and CEO Jeff Jaffe, the W3C focus is on leading the World Wide Web to its full potential by developing standards, protocols and guidelines that ensure the long-term growth of the Web
Our interests span quantum complexity theory, barriers to solving P versus NP, theoretical computer science with a focus on probabilistically checkable proofs (PCP), pseudo-randomness, coding theory, and algorithms.
Our group’s goal is to create, based on such microscopic connectivity and functional data, new mathematical models explaining how neural tissue computes.
Theory research at CSAIL covers a broad spectrum of topics, including algorithms, complexity theory, cryptography, distributed systems, parallel computing and quantum computing.
We work on a wide range of problems in distributed computing theory. We study algorithms and lower bounds for typical problems that arise in distributed systems---like resource allocation, implementing shared memory abstractions, and reliable communication.
Our mission is to work with policy makers and cybersecurity technologists to increase the trustworthiness and effectiveness of interconnected digital systems.
Our lab focuses on designing algorithms to gain biological insights from advances in automated data collection and the subsequent large data sets drawn from them.
We're developing a flexible, high-performance storage architecture for database-backed applications, based on a dynamic set of queries specified by the developer which Soup automatically optimizes.
Our goal is to develop an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run.
Our goal is to develop new applications and algorithms that leverage the skills of distributed crowdworkers, notably for image and video processing applications.
BlueDBM is an architecture of computer clusters consisting of fast distributed flash storage and in-storage accelerators, which often outperforms larger and more expensive clusters in applications such as graph analytics.
Data scientists universally report that they spend at least 80% of their time finding data sets of interest, accessing them, cleaning them and assembling them into a unified whole.
The MOOC Learner Project provides learning scientists, instructional designers and online education specialists with open source software that enables them to efficiently extract teaching and learning insights from the data collected when students learn on the edX or open edX platform.
Our goal is to help medical researchers and clincians understand the growing repositories of waveform and signal data collected from critically ill patients.
We plan to develop a suite of graph compression and reordering techniques as part of the Ligra parallel graph processing framework to reduce space usage and improve performance of graph algorithms.
We aim to understand theory and applications of diversity-inducing probabilities (and, more generally, "negative dependence") in machine learning, and develop fast algorithms based on their mathematical properties.
Data often has geometric structure which can enable better inference; this project aims to scale up geometry-aware techniques for use in machine learning settings with lots of data, so that this structure may be utilized in practice.
Generation of sequential data involves multiple factors operating at different temporal scales. Take natural speech for example, the speaker identity tends to be consistent within an utterance, while the phonetic content changes from frame to frame. By explicitly modeling such hierarchical generative process under a probabilistic framework, we proposed a model that learns to factorizes sequence-level factors and sub-sequence-level factors into different sets of representations without any supervision.
Our main goal is to develop fact checking algorithms that can assess the credibility of claims mentioned in the textual statements and provide interpretable valid evidence that explains why a certain claim is considered as factually true or fake.
Last week MIT’s Institute for Foundations of Data Science (MIFODS) held an interdisciplinary workshop aimed at tackling the underlying theory behind deep learning. Led by MIT professor Aleksander Madry, the event focused on a number of research discussions at the intersection of math, statistics, and theoretical computer science.
This week it was announced that MIT professors and CSAIL principal investigators Shafi Goldwasser, Silvio Micali, Ronald Rivest, and former MIT professor Adi Shamir won this year’s BBVA Foundation Frontiers of Knowledge Awards in the Information and Communication Technologies category for their work in cryptography.
Last month, three MIT materials scientists and their colleagues published a paper describing a new artificial-intelligence system that can pore through scientific papers and extract “recipes” for producing particular types of materials.
Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.
Artificial intelligence (AI) in the form of “neural networks” are increasingly used in technologies like self-driving cars to be able to see and recognize objects. Such systems could even help with tasks like identifying explosives in airport security lines.
We live in the age of big data, but most of that data is “sparse.” Imagine, for instance, a massive table that mapped all of Amazon’s customers against all of its products, with a “1” for each product a given customer bought and a “0” otherwise. The table would be mostly zeroes.
Every year 40,000 women die from breast cancer in the U.S. alone. When cancers are found early, they can often be cured. Mammograms are the best test available, but they’re still imperfect and often result in false positive results that can lead to unnecessary biopsies and surgeries.
Most modern websites store data in databases, and since database queries are relatively slow, most sites also maintain so-called cache servers, which list the results of common queries for faster access. A data center for a major web service such as Google or Facebook might have as many as 1,000 servers dedicated just to caching.
Laparoscopy is a surgical technique in which a fiber-optic camera is inserted into a patient’s abdominal cavity to provide a video feed that guides the surgeon through a minimally invasive procedure. Laparoscopic surgeries can take hours, and the video generated by the camera — the laparoscope — is often recorded. Those recordings contain a wealth of information that could be useful for training both medical providers and computer systems that would aid with surgery, but because reviewing them is so time consuming, they mostly sit idle.
This week the Association for Computer Machinery presented CSAIL principal investigator Daniel Jackson with the 2017 ACM SIGSOFT Outstanding Research Award for his pioneering work in software engineering. (This fall he also received the ACM SIGSOFT Impact Paper Award for his research method for finding bugs in code.)An EECS professor and associate director of CSAIL, Jackson was given the Outstanding Research Award for his “foundational contributions to software modeling, the creation of the modeling language Alloy, and the development of a widely used tool supporting model verification.”
Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.
November 2, 2016 - Sarah Parcak of the University of Alabama gave a Dertouzos Distinguished Lecture titled "Hacking Archaeology: Beyond Shovels or iSandbox?"
Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.In the 2015 volume of Transactions of the Association for Computational Linguistics, CSAIL researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.
Our vision is data-driven machine learning systems that advance the quality of healthcare, the understanding of cyber arms races and the delivery of online education.
Our interests span quantum complexity theory, barriers to solving P versus NP, theoretical computer science with a focus on probabilistically checkable proofs (PCP), pseudo-randomness, coding theory, and algorithms.
Our lab focuses on designing algorithms to gain biological insights from advances in automated data collection and the subsequent large data sets drawn from them.
Our group’s goal is to create, based on such microscopic connectivity and functional data, new mathematical models explaining how neural tissue computes.
We combine methods from computer science, neuroscience and cognitive science to explain and model how perception and cognition are realized in human and machine.
We aim to develop the science of autonomy toward a future with robots and AI systems integrated into everyday life, supporting people with cognitive and physical tasks.
We are an interdisciplinary group of researchers blending approaches from human-computer interaction, social computing, databases, information management, and databases.
We investigate language in different contexts: from how it is learned, to how it is grounded in visual perception, all the way to how machines can readily interact with humans.
Our mission is to work with policy makers and cybersecurity technologists to increase the trustworthiness and effectiveness of interconnected digital systems.
Theory research at CSAIL covers a broad spectrum of topics, including algorithms, complexity theory, cryptography, distributed systems, parallel computing and quantum computing.
We work on a wide range of problems in distributed computing theory. We study algorithms and lower bounds for typical problems that arise in distributed systems---like resource allocation, implementing shared memory abstractions, and reliable communication.
We aim to develop a systematic framework for robots to build models of the world and to use these to make effective and safe choices of actions to take in complex scenarios.
Our goal is to develop an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run.
Alloy is a language for describing structures and a tool for exploring them. It has been used in a wide range of applications from finding holes in security mechanisms to designing telephone switching networks. Now one of the main challenges is to make the system more usable and understandable for the user.
Automatic speech recognition (ASR) has been a grand challenge machine learning problem for decades. Our ongoing research in this area examines the use of deep learning models for distant and noisy recording conditions, multilingual, and low-resource scenarios.
We study the fundamentals of Bayesian optimization and develop efficient Bayesian optimization methods for global optimization of expensive black-box functions originated from a range of different applications.
BlueDBM is an architecture of computer clusters consisting of fast distributed flash storage and in-storage accelerators, which often outperforms larger and more expensive clusters in applications such as graph analytics.
Our main goal is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval).
We plan to develop a suite of graph compression and reordering techniques as part of the Ligra parallel graph processing framework to reduce space usage and improve performance of graph algorithms.
Our goal is to develop new applications and algorithms that leverage the skills of distributed crowdworkers, notably for image and video processing applications.
Data scientists universally report that they spend at least 80% of their time finding data sets of interest, accessing them, cleaning them and assembling them into a unified whole.
We aim to understand theory and applications of diversity-inducing probabilities (and, more generally, "negative dependence") in machine learning, and develop fast algorithms based on their mathematical properties.
Our main goal is to develop fact checking algorithms that can assess the credibility of claims mentioned in the textual statements and provide interpretable valid evidence that explains why a certain claim is considered as factually true or fake.
Data often has geometric structure which can enable better inference; this project aims to scale up geometry-aware techniques for use in machine learning settings with lots of data, so that this structure may be utilized in practice.
Our goal is to help medical researchers and clincians understand the growing repositories of waveform and signal data collected from critically ill patients.
Last week MIT’s Institute for Foundations of Data Science (MIFODS) held an interdisciplinary workshop aimed at tackling the underlying theory behind deep learning. Led by MIT professor Aleksander Madry, the event focused on a number of research discussions at the intersection of math, statistics, and theoretical computer science.
This week it was announced that MIT professors and CSAIL principal investigators Shafi Goldwasser, Silvio Micali, Ronald Rivest, and former MIT professor Adi Shamir won this year’s BBVA Foundation Frontiers of Knowledge Awards in the Information and Communication Technologies category for their work in cryptography.
Last month, three MIT materials scientists and their colleagues published a paper describing a new artificial-intelligence system that can pore through scientific papers and extract “recipes” for producing particular types of materials.
Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.
Artificial intelligence (AI) in the form of “neural networks” are increasingly used in technologies like self-driving cars to be able to see and recognize objects. Such systems could even help with tasks like identifying explosives in airport security lines.
We live in the age of big data, but most of that data is “sparse.” Imagine, for instance, a massive table that mapped all of Amazon’s customers against all of its products, with a “1” for each product a given customer bought and a “0” otherwise. The table would be mostly zeroes.
Every year 40,000 women die from breast cancer in the U.S. alone. When cancers are found early, they can often be cured. Mammograms are the best test available, but they’re still imperfect and often result in false positive results that can lead to unnecessary biopsies and surgeries.
Most modern websites store data in databases, and since database queries are relatively slow, most sites also maintain so-called cache servers, which list the results of common queries for faster access. A data center for a major web service such as Google or Facebook might have as many as 1,000 servers dedicated just to caching.
Laparoscopy is a surgical technique in which a fiber-optic camera is inserted into a patient’s abdominal cavity to provide a video feed that guides the surgeon through a minimally invasive procedure. Laparoscopic surgeries can take hours, and the video generated by the camera — the laparoscope — is often recorded. Those recordings contain a wealth of information that could be useful for training both medical providers and computer systems that would aid with surgery, but because reviewing them is so time consuming, they mostly sit idle.
This week the Association for Computer Machinery presented CSAIL principal investigator Daniel Jackson with the 2017 ACM SIGSOFT Outstanding Research Award for his pioneering work in software engineering. (This fall he also received the ACM SIGSOFT Impact Paper Award for his research method for finding bugs in code.)An EECS professor and associate director of CSAIL, Jackson was given the Outstanding Research Award for his “foundational contributions to software modeling, the creation of the modeling language Alloy, and the development of a widely used tool supporting model verification.”
Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.
November 2, 2016 - Sarah Parcak of the University of Alabama gave a Dertouzos Distinguished Lecture titled "Hacking Archaeology: Beyond Shovels or iSandbox?"
Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.In the 2015 volume of Transactions of the Association for Computational Linguistics, CSAIL researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.