Explorations in robust optimization of deep networks for adversarial examples: provable defenses, threat models, and overfitting


Carnegie Mellon University (CMU)


Aleksander Madry
While deep networks have contributed to major leaps in raw performance across various applications, they are also known to be quite brittle to targeted data perturbations, so-called adversarial examples, and pose a serious risk for safety- and security-centric applications where reliability and robustness are critical.

In this talk, we discuss a number of approaches for mitigating the effect of adversarial examples, which can offer varying degrees and types of robustness. We first discuss provable defenses which can guarantee that no adversarial example exists within an L-p bounded region. Next, we study alternative threat models for the adversarial example, such as the Wasserstein threat model and the union of multiple threat models. Finally, we present some unexpected findings on the robust learning problem, showing that weak adversaries can be sufficient for training and that overfitting is a dominant phenomenon in adversarially robust training.