UROP Research Opportunities

The Undergraduate Research Opportunities Program (UROP) cultivates and supports research partnerships between MIT undergraduates and faculty. If you have any questions please contact tluongo@csail.mit.edu or take a look at the How to UROP at CSAIL document (pdf format).  

This program is available to MIT students only.


  • Project: To program a model of fly's visual tracking.

    We are looking for a UROP to program the simulated behavior of several artificial flies, interacting visually with each other. Each fly is described by a simple tracking system (Buelthoff, Poggio and Wehrhahn 1980; Wehrhahn, Poggio and Buelthoff 1982) which summarizes behavioral experiments in which individual real flies track and chase targets. The model for this behavior is suggested by M. Poggio and T. Poggio in their paper: Cooperative physics of fly swarms: an emergent behavior. A.I Memo No. 1512, C.B.C.L Paper No. 103 (1994). We expect the model to be programmed and hopefully also...

    Posted date: March 21, 2013
  • Video Magnification

    We are developing algorithms to manipulate temporal variations in videos, to reveal small imperceptible changes (check out this video, and read more about it here) as well as automatically remove distracting changes (see here).

    We are looking for a strong and motivated student to work closely with us on exploring potential applications. This includes trying out our methods with different kinds of data, such as medical images (fMRI), satellite imagery, seismic data and time-lapse videos, developing tools to facilitate the experiments, and potentially helping us tune and improve the...

    Posted date: March 14, 2013
  • Bringing Large Scale Machine learning service to the desktop

    Wouldn't it be exciting to be able to call on command line: classify(datafileLoc), or regress(dataFileLoc) and spin of 100s or more nodes on the cloud that access the data from the location specified at dataFileLoc and run machine learning and return results. We have built a large scale, cloud-based, machine learning system. This paper explains the latest version of our system. Our system is a collection of distributed compute units and its design...

    Posted date: February 12, 2013
  • Evolutionary Design and Optimization Group: Student Research Opportunities for Spring/Summer 2013

    Posted date: February 12, 2013
  • Mining a MOOC's activity data: 6.002X explored

    We are building a variety of machine learning algorithms for mining data generated while delivering educational content to hundreds and thousands of students all over the world. A very fundamental question that folks in education are attempting to answer is: "What worked?" Answering this question would require us to analyze data in novel ways, for example building models of students, balancing for confounding factors. We are looking for a talented UROP or MEng student to work with a Research Scientist and a group of scientists and fellows at the MIT EdX team. This project has possible...

    Posted date: February 12, 2013
  • Big Data+ Machine learning + Medicine + Volunteer compute: Could it get anymore exciting?

    Come join us and learn how we are building a large scale machine learning system through which we are attempting to solve some of the most challenging problems for our society. The most fascinating part of this is that we want to do this by using the the left over cpu cycles on machines all across the world. Technically, this creates a challenge for us as we are not able to centrally coordinate and plan data distribution, algorithmic steps and collect and process results. During the first two years we have made a lot of progress and are now seeking students to work with us in deploying...

    Posted date: February 12, 2013
  • BP-Watch: Predicting blood pressure in an ICU setting

    We are building a large scale predictive system that predicts the blood pressure for a patient under intensive care. The project relies on cloud-scale machine learning of many diverse predictive models. A variety of tasks are on the agenda including cloud-scale empirical experimentation, cross-referencing model predictions to clinical events, time series modeling, unsupervised learning of similar blood pressure segments and ultimately transforming many model outputs which are in the form of probabilities and predictions into visualizations that are succinct and informative to the doctors...

    Posted date: February 12, 2013
  • Feature Decision Boundaries and Quantization for Big Data Classification with ML

    When building a rule based classifier (aka decision list) that allows readability, the decision boundaries have a significant effect on the accuracy of the solutions. The goal of this project is to develop efficient methods and algorithms to identify decision boundaries for large feature sets. We are working with a large scale classification problem in the medical domain with possibly hundreds and thousands of variables, some of which are tightly correlated. Efficient methods to identify thresholds for decision boundaries is intractable. You will work with a team of researchers with strong...

    Posted date: February 12, 2013
  • FlexGP: Evaluating a Million models on a Billion cases

    Our FlexGP system currently generates thousands of non-linear models that are of the form y=f(x), where f(.) could be any mathematical function generated from a set of operators, log, sin, sqrt. For example an expression could be y=log(x1)+sin(x2). For big data problems we have to perform multiple passes through the data, each time applying the model to the data and measuring its accuracy, to identify the best set of non-linear expressions that best explain the data. In this regard we are investigating and developing methods in order to be able to evaluate a million models on a billion...

    Posted date: February 12, 2013
  • Machine Learning and Big Data: Performance Analytics

    Machine learning systems depend on parameters and sometimes, buried deep inside, some randomized initial state. So, when we run them on BigData with different parameterizations, how can we unify the results? As well, how can we interpret an ML system's rules, classifiers, or models to learn how to iterate with an updated question for the ML system so we get better accuracy? Is the algorithm having trouble predicting a certain class? Why? Is it because of class imbalance or inadequate discriminatory power of a feature? Should we adjust the objective function to address these issues? Are the...

    Posted date: February 12, 2013
  • Predicting "Rare" events in an ICU

    We are developing a prediction system that predicts rare events like hypotensive episodes in an ICU setting. We have assembled a large arterial blood pressure feature-level dataset from a publicly available waveform dataset. One of the challenges is that the balance of the classes in the data is extremely skewed due to the rare nature of the events we are interested in. This imbalance in the data can significantly impact the accuracy of the forecast and it especially affects the dynamics of our iterative learning engine. The goal of the project is to develop and identify an efficient...

    Posted date: February 12, 2013
  • Scalable methods for fusing Multiple models generated for big data

    When dealing with big data we generate thousands of models where each model specializes on a subset of the data. Once we generate these thousands of models we are developing techniques that are able to combine these multiple models by learning weights for fusing their predictions. The techniques range from simple average to weighted sum to probabilistic approaches. Known as ensemble learning these methods have been able to allow users to reach prediction accuracies higher than one single model.

    See for example...

    Posted date: February 12, 2013
  • Bibliometrics using Machine Learning and Natural Language Processing

    Reviewing past and current literature is a key scientific and engineering activity.  Quantitative analysis of the language, topics, keywords, or citation graphs of any given subset of scientific literature (bibliometrics) can be a great help to understanding what has been done in a field and what the important next steps are. Unfortunately, off-the-shelf support for automatic extraction of citation graphs and analysis of those graphs using natural language processing and machine learning is still relatively limited.  This UROP project will aim to advance the state of the art of...

    Posted date: February 04, 2013
  • Understanding the Human Learning Process Using AI Techniques


    We are looking for strong Python programmers interested in contributing to a core learning architecture, which is going to set the standard internationally for reinforcement learning research. Through the process you will learn how to formulate many sequential decision making problems such as balancing an inverted pendulum, as Markov Decision Processes (MDPs) and how can you solve such problems.
    Following skills are required:
    - Object Oriented Programming in Python
    - Linear Algebra

    Following skills are highly desired:

    Posted date: December 19, 2012
  • Policies for personal information on the Web

    At CSAIL’s Decentralized Information Group, we think about information on the Web: Where it comes from, what happens to it, and what are the rules for using it. We’ve all seen stories about what people can learn about you from social networks, and the good and bad consequences of that. How can we promote good impacts of that, while minimizing risks and harmful effects? At DIG, we take the perspective that data on the Web should travel together with additional information that says where it comes from (provenance and context) and how it should be used (policy). We build to help...

    Posted date: December 11, 2012
  • Learning computing by building mobile apps

    In December 2010, Google introduced App Inventor for Android, a visual programming environment that makes it easy to create apps for Android phones. Prof. Abelson, who worked on App Inventor during his sabbatical last year, is planning to include the system in a new course this fall. He’s looking for help in developing good examples and creating extensions to the system, maybe even teaching summer workshops for kids. To work on this project, you should have some experience with Python (6.00 or 6.01) and an interest in mobile apps and educational technology. For more information see...

    Posted date: December 11, 2012
  • Authoring of Online Video Lectures

    We are developing new authoring software for video lectures in the “virtual white board” style popularized by the Khan academy. Many UROP topics are available to help with this endeavor.

    Contact: Fredo Durand fredo@mit.edu

    Posted date: December 04, 2012
  • Visualizations for the Online Python Tutor

    The goal of this UROP is to augment the Online Python Tutor http://pythontutor.com/ with visualizations of the flow and variable changes of a program over time. This is partially inspired by Brett Victor’s learnable programming essay http://worrydream.com/LearnableProgramming/ .

    Contact: Fredo Durand fredo@mit.edu

    Posted date: December 04, 2012
  • Knit: Integrating Human Based Partial Analyses of Big Data

    We are developing means to automatically assist analysts/experts to
    identify patterns and detect anomalies in big data streams as they arise
    when heterogeneous, unstructured data sources are consulted. Our
    approach solely relies upon analysts and their ability to group/categorize "common situations" into patterns. As one can imagine, an analyst can only process a subset of the big data. We are developing machine learning
    algorithms that will use these partial groupings for a subset of the big data from each analyst and knit together, i.e....

    Posted date: December 04, 2012
  • Cloud Computing for Mathematics, Science, and Engineering – The Julia Project

    What is Julia? See the BLOG: http://julialang.org/blog/2012/02/why-we-created-julia/ which answers why we created Julia. In short, because we are greedy. We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane...

    Posted date: August 07, 2012