Abstract: Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. In this talk we’ll address a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level \epsilon, how many i.i.d. samples are needed to distinguish P from Q subject to \epsilon-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we'll characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level \epsilon, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. This result is an analogue of the classical Neyman–Pearson lemma in the setting of private hypothesis testing. The characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.
Joint work with Clement Canonne, Gautam Kamath, Adam Smith and Jonathan Ullman.