[ML+Crypto] Statistically Undetectable Backdoors in Deep Neural Networks
Speaker
Host
ML+Crypto Seminar
Title: Statistically Undetectable Backdoors in Deep Neural Networks
Speaker: Neekon Vafa (MIT)
Time: Tuesday, October 21, 10:30–11.45am
Location: 32-G575
Seminar series: ML+Crypto
In this talk, I will show how an adversarial model trainer can plant backdoors in a large class of deep, feedforward neural networks. These backdoors are statistically undetectable in the white-box setting, meaning that the backdoored and honestly trained models are close in total variation distance, even given the full descriptions of the models (e.g., all of the weights). The backdoor provides access to (invariance-based) adversarial examples for every input. However, without the backdoor, no one can generate any such adversarial examples, assuming the worst-case hardness of shortest vector problems on lattices. Our main technical tool relies on a cryptographic perspective on the ubiquitous Johnson-Lindenstrauss lemma.
This talk is based on upcoming work with Andrej Bogdanov and Alon Rosen.