Artificial intelligence (AI) in the form of “neural networks” are increasingly used in technologies like self-driving cars to be able to see and recognize objects. Such systems could even help with tasks like identifying explosives in airport security lines.
But in many respects they're a black box, in the sense that researchers who develop them don’t know exactly how they work or how they could be fooled.
Imagine, for example, if a terrorist could make small tweaks to the physical design of a bomb, in such a way that it evades the detection of a TSA device.
While we’re years away from a scenario as terrifying as that, this week CSAIL researchers showed how much higher the stakes could be: in a new paper, they demonstrate the first-ever method of producing actual real-world 3D objects that can consistently fool neural networks.
The team shows that they’re not only able to fool a neural network into thinking that a gun is no longer a gun - they can actually fool it into classifying a physical object as anything they want. By slightly changing the object’s texture, the team’s method could produce a bomb that would get classified as a tomato, or could potentially even render an object entirely invisible.
For example, the team 3D-printed a toy turtle that was misclassified as a rifle and a baseball that was classified as an espresso, no matter what angle the neural network views them from.
“This work clearly shows that something is broken with how neural networks work, and that researchers who develop these systems need to be spending a lot more time thinking about defending against these sorts of so-called ‘adversarial examples,’” says PhD candidate Anish Athalye, who was one of the lead authors on the new paper. “If we want safe self-driving cars and other systems that use neural networks, this is an area of research that needs to be the focus of much more study.”
The project builds on a growing body of work in “adversarial examples.” For many years researchers have been able to show that changing pixels can fool neural networks, but such corner-cases have often been viewed more as an intellectual curiosity than as something to be concerned about in the real-world.
This is largely because researchers have mostly only been able to trick the systems using static 2D images - as you move a physical 3D object around, the networks can find angles that allow them to accurately classify the object.
Meanwhile, the MIT team’s new method can generate adversarial examples that continue to fool networks across any chosen distribution of transformations, no matter how you distort or re-position the object within the distribution. (Developing the method required considering a range of complications, from lighting to camera noise.)
The team, which also includes master’s student Andrew Ilyas and undergraduate Logan Engstrom, as well as Kevin Kwok BS ‘17, is quick to caution that there is no evidence that this kind of manipulation is actually going on. 3D-printing is tedious, complicated and expensive, and there are other factors that would make something like disguising a bomb extremely difficult to pull off (i.e. the TSA’s use of X-ray technology for imaging).
But the team is still eager to point out the possibilities, for no other reason than to highlight the seriousness of the issue and the importance of deep-learning researchers being aware of the limits of their creations.
The paper is currently under review for the 2018 International Conference on Learning Representations (ICLR).