Machine Learning and Big Data: Performance Analytics

Machine learning systems depend on parameters and sometimes, buried deep inside, some randomized initial state. So, when we run them on BigData with different parameterizations, how can we unify the results? As well, how can we interpret an ML system's rules, classifiers, or models to learn how to iterate with an updated question for the ML system so we get better accuracy? Is the algorithm having trouble predicting a certain class? Why? Is it because of class imbalance or inadequate discriminatory power of a feature? Should we adjust the objective function to address these issues? Are the results consistent? Robust?

This project will introduce you to a machine learning system, executing a large scale distributed setting, learning from large datasets. It will familiarize you with the process of engineering a problem for the ML algorithm to solve, making sense of the algorithm's results and behavior, then iterating with new ideas on addressing the problem. The specific setting is a prediction problem: will an ICU patient's blood pressure be high, medium or low after a lead time. We have collected a large archive of data, have generated multiple predictive models, and are now analyzing the results. We are asking questions such as: What cohort of patients are hard to predict and why? Which class labels are hard to predict and why? Your project will be closely integrated into team effort where the team consists of a post-doc, and graduate students.

MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867) 
Please contact:, or