Q&A: Phillip Isola on the art and science of generative models

phillip 1

If you’ve ever wondered what a loaf of bread would look like as a catedges2cats is for you. The program that turns sketches into images of cats is one of many whimsical creations inspired by Phillip Isola’s image-to-image translation software released in the early days of generative adversarial networks, or GANs. In a 2016 paper, Isola and his colleagues showed how a new type of GAN could transform a hand-drawn shoe into its fashion-photo equivalent, or turn an aerial photo into a grayscale map. Later, the researchers showed how landscape photos could be reimagined in the impressionist brushstrokes of Monet or Van Gogh. Now an assistant professor in MIT’s Department of Electrical Engineering and Computer Science, Isola continues to explore what GANs can do. 

GANs work by pairing two neural networks, trained on a large set of images. One network, the generator, outputs an image patterned after the training examples. The other network, the discriminator, rates how well the generator’s output image resembles the training data. If the discriminator can tell it’s a fake, the generator tries again and again until its output images are indistinguishable from the examples. When Isola first heard of GANs, he was experimenting with nearest-neighbor algorithms to try to infer the underlying structure of objects and scenes. 

GANs have an uncanny ability to get at the essential structure of a place, face, or object, making structured prediction easier. Introduced five years ago, GANs have been used to visualize the ravages of climate change, produce more realistic computer simulations, and protect sensitive data, among other applications. 

To connect the growing number of GAN enthusiasts at MIT and beyond, Isola has recently helped to organize GANocracy, a day of talks, tutorials, and posters being held at MIT on May 31 that is co-sponsored by the MIT Quest for Intelligence and MIT-IBM Watson AI Lab. Isola recently spoke about the future of GANs.

Q: Your image-to-image translation paper has more than 2,000 citations. What made it so popular?

A: It was one of the earlier papers to show that GANs are useful for predicting visual data. We showed that this setting is very general, and can be thought of as translating between different visualizations of the world, which we called image-to-image translation. GANs were originally proposed as a model for producing realistic images from scratch. But the most useful application may be structured prediction, which is what GANs are mostly being used for these days.

Q: GANs are easily customized and shared on social media. Any favorites among these projects?

A: #Edges2cats is probably my favorite, and it helped to popularize the framework early on. Architect Nono Martínez Alonso has used pix2pix for exploring interesting tools for sketch-based design. I like everything by Mario Klingemann; Alternative Face is especially thought-provoking. It puts one person’s words into someone else’s mouth, hinting at a potential future of “alternative facts.” Scott Eaton is pushing the limits of GANs by translating sketches into 3-D sculptures. 

Q: What other GAN art grabs you?

A: I really like all of it. One remarkable example is GANbreeder. It’s a human-curated evolution of GAN-generated images. The crowd chooses which images to breed or kill off. Over many generations, we end up with beautiful and unexpected images.

Q: How are GANs being used beyond art? 

A: In medical imaging, they’re being used to generate CT scans from MRIs. There’s potential there, but it can be easy to misinterpret the results: GANs are making predictions, not revealing the truth. We don't yet have good ways to measure the uncertainty of their predictions. I'm also excited about the use of GANs for simulations. Robots are often trained in simulators to reduce costs, creating complications when we deploy them in the real world. GANs can help bridge the gap between simulation and reality.

Q: Will GANs redefine what it means to be an artist?

A: I don't know, but it's a super-interesting question. Several of our GANocracy speakers are artists, and I hope will touch on this. GANs and other generative models are different than other kinds of algorithmic art. They are trained to imitate, so the people being imitated probably deserve some credit. The art collective, Obvious, recently sold a GAN image at Christie's for $432,500. Obvious selected the image, signed and framed it, but the code was derived from work by then-17-year-old Robbie Barrat. Ian Goodfellow helped develop the underlying algorithm. 

Q: Where is the field heading?

A: As amazing as GANs are, they are just one type of generative model. GANs might eventually fade in popularity, but generative models are here to stay. As models of high-dimensional structured data, generative models get close to what we mean when we say “create,” “visualize,” and “imagine.” I think they will be used more and more to approximate capabilities that still seem uniquely human. But GANs do have some unique properties. For one, they solve the generative modeling problem via a two-player competition, creating a generator-discriminator arms race that leads to emergent complexity. Arms races show up across machine learning, including in the AI that achieved superhuman abilities in the game Go.

Q: Are you worried about the potential abuse of GANs?

A: I’m definitely concerned about the use of GANs to generate and spread misleading content, or so-called fake news. GANs make it a lot easier to create doctored photos and videos, where you no longer have to be a video editing expert to make it look like a politician is saying something they never actually said.

Q: You and the other GANocracy organizers are advocating for so-called GANtidotes. Why?

A: We would like to inoculate society against the misuse of GANs. Everyone could just stop trusting what we see online, but then we’d risk losing touch with reality. I’d like to preserve a future in which “seeing is believing.” Luckily, many people are working on technical antidotes that range from detectors that seek out the telltale artifacts in a GAN-manipulated image to cryptographic signatures that verify that a photo has not been edited since it was taken. There are a lot of ideas out there, so I’m optimistic it can be solved.