Neural networks have been shown to misclassify – with high confidence – examples only slightly different from correctly classified ones. These misclassified examples are known as adversarial examples, and can serve to explicate flaws in the way that the neural network architecture conceptualizes its decision function. Previous approaches to finding adversarial examples have used stochastic approaches to find an example close to the original but do not allow us to determine the distance to the closest adversarial example. We capitalize on the piecewise-linear nature of popular activation functions used within a neural net, expressing the problem of finding the closest adversarial example as a mixed-integer problem. The advantage of our approach is that we are able to provide formal verification of the performance of a network. The magnitude of the minimum perturbation could also be integrated into the cost function during the training process to improve the robustness of the neural network to adversarial examples.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.