Security and Privacy Guarantees in Machine Learning with Differential Privacy


Roxana Geambasu
Columbia University


Frans Kaashoek
Machine learning (ML) is becoming a critical foundation for how we construct the code driving our applications, cars, and life-changing financial decisions. Yet, it is often brittle and unstable, making decisions that are hard to understand and can be exploited. As one example, tiny changes to an input can cause dramatic changes in predictions; this results in decisions that surprise, appear unfair, or enable attack vectors such as adversarial examples. As another example, models trained on users' data have been shown to encode not only general trends from large datasets but also very specific, personal information from these datasets, such as social security numbers and credit card numbers from emails; this threatens to expose users' secrets through ML predictions or parameters. Over the years, researchers have proposed various approaches to address these rather distinct security, privacy, and transparency challenges. Most of the work has been best effort, which is insufficient if ML is to become a rigorous basis for how we construct our code.

This talk positions differential privacy (DP) -- a theory developed by the privacy community -- as a versatile foundation for building into ML much-needed guarantees of not only privacy but also of security, stability, and transparency. As supporting evidence, I first present PixelDP, a scalable certified defense against adversarial examples that leverages DP theory to guarantee a level of robustness against this attack. I then present Sage, a DP ML platform that bounds the leakage of personal secrets through ML models while addressing some of the most pressing challenges of DP, such as the "running out of privacy budget" problem. Both PixelDP and Sage are designed from a pragmatic systems perspective and illustrate that DP theory is powerful but requires adaptation to achieve practical guarantees for ML workloads.