Computer drawings fool human judges, pass “visual Turing test”

Humans and machines were given an image of a novel character (top) and asked to produce new versions. A machine generated the nine-character grid on the left.

Researchers at CSAIL, New York University, and the University of Toronto have developed a computer system whose ability to produce a variation of a character in an unfamiliar writing system, on the first try, is indistinguishable from that of humans.

That means that the system in some sense discerns what’s essential to the character — its general structure — but also what’s inessential — the minor variations characteristic of any one instance of it.

As such, the researchers argue, their system captures something of the elasticity of human concepts, which often have fuzzy boundaries but still seem to delimit coherent categories. It also mimics the human ability to learn new concepts from few examples. It thus offers hope, they say, that the type of computational structure it’s built on, called a probabilistic program, could help model human acquisition of more sophisticated concepts as well.

“In the current AI landscape, there’s been a lot of focus on classifying patterns,” says Josh Tenenbaum, a professor in the Department of Brain and Cognitive sciences at MIT, a principal investigator in the MIT Center for Brains, Minds and Machines, and one of the new system’s co-developers. “But what’s been lost is that intelligence isn’t just about classifying or recognizing; it’s about thinking.”

“This is partly why, even though we’re studying hand-written characters, we’re not shy about using a word like ‘concept,’” he adds. “Because there are a bunch of things that we do with even much richer, more complex concepts that we can do with these characters. We can understand what they’re built out of. We can understand the parts. We can understand how to use them in different ways, how to make new ones.”

The new system was the thesis work of Brenden Lake, who earned his PhD in cognitive science from MIT last year as a member of Tenenbaum’s group, and who won the Glushko Prize for outstanding dissertations from the Cognitive Science Society. Lake, who is now a postdoc at New York University, is first author on a paper describing the work in the latest issue of the journal Science. He’s joined by Tenenbaum and Ruslan Salakhutdinov, an assistant professor of computer science at the University of Toronto who was a postdoc in Tenenbaum’s group from 2009 to 2011.

Rough ideas

“We analyzed these three core principles throughout the paper,” Lake says. “The first we called compositionality, which is the idea that representations are built up from simpler primitives. Another is causality, which is that the model represents the abstract causal structure of how characters are generated. And the last one was learning to learn, this idea that knowledge of previous concepts can help support the learning of new concepts. Those ideas are relatively general. They can apply to characters, but they could apply to many other types of concepts.”

The researchers subjected their system to a battery of tests. In one, they presented it with a single example of a character in a writing system it had never seen before and asked it to produce new instances of the same character — not identical copies, but nine different variations on the same character. In another test, they presented it with several characters in an unfamiliar writing system and asked it to produce new characters that were in some way similar. And in a final test, they asked it to make up entirely new characters in a hypothetical writing system.

Human subjects were then asked to perform the same three tasks. Finally, a separate group of human judges was asked to distinguish the human subjects’ work from the machine’s. Across all three tasks, the judges could identify the machine outputs with about 50 percent accuracy — no better than chance.

Conventional machine-learning systems — such as the ones that led to the speech-recognition algorithms on smartphones — often perform very well on constrained classification tasks, but they must first be trained on huge sets of training data. Humans, by contrast, frequently grasp concepts after just a few examples. That type of “one-shot learning” is something that the researchers designed their system to emulate.

Learning to learn

Like a human subject, however, the system comes to a new task with substantial background knowledge, which in this case is captured by a probabilistic program. Whereas a conventional computer program systematically decomposes a high-level task into its most basic computations, a probabilistic program requires only a very sketchy model of the data it will operate on. Inference algorithms then fill in the details of the model by analyzing a host of examples.

Here, the researchers’ model specified that characters in human writing systems consist of strokes, demarcated by the lifting of the pen, and that the strokes consist of substrokes, demarcated by points at which the pen’s velocity is zero.

Armed with that model, the system then analyzed hundreds of motion-capture recordings of humans drawing characters in several different writing systems, learning statistics on the relationships between consecutive strokes and substrokes as well as on the variation tolerated in the execution of a single stroke.

The system was never trained, however, on the specific writing systems that it analyzed in the researchers’ tests. There, it was simply applying what it had inferred about human writing in general. “It’s learning a bunch of probabilities in a generative program, and that is a generative program for programs,” Tenenbaum says. “Your representation of one of these visual concepts is itself a program that can generate probabilistically different outcomes.”

“I feel that this is a major contribution to science, of general interest to artificial intelligence, cognitive science, and machine learning,” says Zoubin Ghahramani, a professor of information engineering at the University of Cambridge. “Given the major successes of deep learning, the paper also provides a very sobering view of the limitations of such deep-learning methods — which are very data-hungry and perform poorly on the tasks in this paper — and an important alternative avenue for achieving human-level machine learning.”