Hearing by seeing: “visual mic” uses potato-chip bag to recover sound from video footage
CSAIL researchers have helped reconstruct audio signals by analyzing minute vibrations of objects depicted in video.
Researchers at MIT CSAIL, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.
In other experiments, they extracted useful audio signals from videos of aluminum foil, the surface of a glass of water, and even the leaves of a potted plant. The researchers will present their findings in a paper at this year’s Siggraph, the premier computer graphics conference.
“When sound hits an object, it causes the object to vibrate,” says Abe Davis, a CSAIL graduate student and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”
Joining Davis on the Siggraph paper are CSAIL researchers Frédo Durand and Bill Freeman, both professors of computer science and engineering; graduate student Neal Wadhwa; Michael Rubinstein of Microsoft Research, who did his PhD with Freeman; and Gautham Mysore of Adobe Research.