FlexGP: Evaluating a Million models on a Billion cases

Our FlexGP system currently generates thousands of non-linear models that are of the form y=f(x), where f(.) could be any mathematical function generated from a set of operators, log, sin, sqrt. For example an expression could be y=log(x1)+sin(x2). For big data problems we have to perform multiple passes through the data, each time applying the model to the data and measuring its accuracy, to identify the best set of non-linear expressions that best explain the data. In this regard we are investigating and developing methods in order to be able to evaluate a million models on a billion data points in a fraction of a second? How can we get these speeds? Is it possible to get these speeds? These speeds will be paramount in making learning from big data successful. We are seeking an undergraduate, AUP, or senior who is interested in a challenging, very tangible, measurable problem like this and enjoys generating speed ups never been heard of before.

MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867) 
Please contact: kalyan@csail.mit.edu or unamay@csail.mit.edu