Challenges in Reliable Machine Learning


Kamalika Chaudhuri
University of California, San Diego


Philippe Rigollet
MIT, Department of Mathematics
As machine learning is increasingly used in real applications, there is a need for reliable and robust methods. In this talk, we will discuss two such challenges that arise in reliable machine learning. The first is sample selection bias, where training data is available from a distribution conditioned on a sample selection policy, but the resultant classifier needs to be evaluated on the entire population. We will show how we can use active learning to get a small amount of labeled data from the entire population that can be used to correct this kind of sample selection bias. The second is robustness to adversarial examples -- slight strategic perturbations of legitimate test inputs that cause misclassification. We next look at adversarial examples in the context of a simple non-parametric classifier -- the k-nearest neighbor classifier, and look at its robustness properties. We provide bounds on its robustness as a function of k, and propose a more robust 1-nearest neighbor classifier.

Joint work with Songbai Yan, Tara Javidi, Yaoyuan Yang, Cyrus Rastchian, Yizhen Wang and Somesh Jha.

Kamalika Chaudhuri received a Bachelor of Technology from the Indian Institute of Technology, Kanpur, and a PhD in Computer Science from the University of California, Berkeley. Currently, she is an Associate Professor at the University of California, San Diego. She is a recipient of the NSF Career Award, Hellman Faculty Fellowship, and Google and Bloomberg Faculty Awards.

Kamalika's research is on the foundations of trustworthy machine learning -- which includes problems such as learning from sensitive data while preserving privacy, learning under sampling bias, and in the presence of an adversary. She is also broadly interested in a number of topics in learning theory, such as non-parametric methods, online learning, and active learning.