Toward Provable Privacy for Black-Box Algorithms via Algorithmic Stability
Speaker
Host
This thesis focuses on enabling the design of algorithms with provable privacy as a first-order goal. We focus on the following setting: an algorithm is trained on sensitive data and the result is then exposed for public use -- how can we quantify the privacy risk of this exposure? Prior work typically focuses on providing privacy through privatizing specific algorithms. These techniques have two main drawbacks: (1) they require significant white-boxing per algorithm and (2) the privacy-utility tradeoffs may be hard to quantify.
We first leverage the PAC privacy framework to mitigate the white-boxing requirements. In this talk, we show how we can privatize a wide range of database queries in a black-box manner. We discuss how to build a simple privatization layer, PAC-DB, that can provide provable privacy guarantees for general SQL queries. This allows us to expand the capabilities of private database analytics, enabling complex queries without the use of a trusted curator.
We then focus on understanding the utility impacts of privatization. We focus on designing privacy-conscious algorithms. That is, rather than first constructing an algorithm then computing the noise required to privatize it --- a paradigm we refer to as post-hoc privatization --- we optimize the algorithm's hyperparameters given the privacy budget. We instantiate this via regularized linear regression. In particular, we derive the theoretically-optimal regularization weight to maximize utility under a provided privacy budget. We provide experimental results showing the benefits of privacy-conscious design over post-hoc privatization.