Algorithmic Challenges in Machine Learning
Speaker: Kamalika Chaudhuri , University of California, San DiegoContact:
Date: March 4 2010
Time: 4:00PM to 5:00PM
Host: Tommi Jaakkola and Pablo Parrilo, MIT
Francis Doughty, 253-4602, firstname.lastname@example.orgRelevant URL:
In this talk, we address two algorithmic challenges in machine learning.
First, with the increase in electronic record-keeping, many datasets
that learning algorithms work with relate to sensitive information
about individuals. Thus the problem of privacy-preserving learning --
how to design learning algorithms that operate on the sensitive data
of individuals while still guaranteeing the privacy of individuals in
the training set -- has achieved great practical importance. In this
talk, we address the problem of privacy-preserving classification, and
we present an efficient classifier which is private in the
differential privacy model of Dwork et al. Our classifier works in the
ERM (empirical loss minimization) framework, and includes privacy
preserving logistic regression and privacy preserving support
vector machines. We show that our classifier is private, provide
analytical bounds on the sample requirement of our classifier,
and evaluate it on some real data.
Second, we address the problem of clustering, when data is
available from multiple domains or views. For example, when clustering
a document corpus such as Wikipedia, we have access to the contents of
the documents and their link structure. In this talk, we address this
problem of Multiview Clustering, and show how to use information from
multiple views to improve clustering performance. We present an
algorithm for multiview clustering, provide analytical bounds on the
performance of our algorithm under certain statistical assumptions,
and finally evaluate our algorithm on some real data.
Based on joint work with Sham Kakade (UPenn), Karen Livescu (TTI
Chicago), Claire Monteleoni (CCLS Columbia), Anand Sarwate (ITA UCSD),
and Karthik Sridharan (TTI Chicago).
Kamalika Chaudhuri received a Bachelor of Technology degree in
Computer Science and Engineering in 2002 from the Indian Institute of
Technology, Kanpur, and a PhD in Computer Science from UC Berkeley in
2007. She is currently a postdoctoral researcher at the Computer
Science and Engineering Department at UC San Diego.
Kamalika's research is on the design and analysis of machine-learning
algorithms and their applications. In particular, her interests lie in
clustering, online learning, and privacy-preserving
machine-learning, and applications of machine-learning and
algorithms to practical problems in other areas.
See other events that are part of CS Special Seminar Series Spring 2010
See other events happening in March 2010