Certified Defenses against Adversarial Examples
Speaker
Aditi Raghunathan
Stanford
Host
Akshay Degwekar, Pritish Kamath and Govind Ramnarayan
MIT CSAIL
Abstract: While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses.
Can we somehow end this arms race? In this talk, I will present some methods based on convex relaxations (with a focus on semidefinite programming) that output a certificate that for a given network and test input, no attack can force the error to exceed a certain value. I will then discuss how these certification procedures can be incorporated into neural network training to obtain provably robust networks. Finally, I will present some empirical results on the performance of attacks and different certificates on networks trained using different objectives.
Joint work with Jacob Steinhardt and Percy Liang.
Can we somehow end this arms race? In this talk, I will present some methods based on convex relaxations (with a focus on semidefinite programming) that output a certificate that for a given network and test input, no attack can force the error to exceed a certain value. I will then discuss how these certification procedures can be incorporated into neural network training to obtain provably robust networks. Finally, I will present some empirical results on the performance of attacks and different certificates on networks trained using different objectives.
Joint work with Jacob Steinhardt and Percy Liang.