As movies become more CGI-focused, filmmakers have to be increasingly adept at "compositing” - the process of merging foreground and background images, like placing actors on top of planes, or planets, or into fictional worlds like Black Panther’s Wakanda.
Making these images look realistic isn’t easy. Editors have to capture the subtle aesthetic transitions between foreground and background, which can be especially difficult for intricate materials like human hair that people are used to seeing look a certain way.
“The tricky thing about these images is that not every pixel solely belongs to one object,” says Yagiz Aksoy, a visiting researcher at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “In many cases it can be hard to determine which pixels are part of the background and which are part of a specific person.”
Getting these details right is tedious, time-consuming and difficult for anyone but the most seasoned of editors. But in a new paper, Aksoy and his colleagues at MIT CSAIL demonstrate a way to use machine learning to automate many parts of the editing process for photos - and say that the approach could also be used for moving images, too.
The team’s approach allows them to take an image and automatically decompose it into a set of different layers that are separated by a series of “soft transitions” between layers.
Dubbed “semantic soft segmentation” (SSS), the system analyzes the original image’s texture and color and combines it with information gleaned by a neural network about what the objects within the image actually are.
“Once these soft segments are computed, the user doesn’t have to manually change transitions or make individual modifications to the appearance of a specific layer of an image,” says Aksoy, who presented the paper at the annual SIGGRAPH computer-graphics conference in Vancouver this past week. “Manual editing tasks like replacing backgrounds and adjusting colors would be made much easier.”
To be clear, SSS is currently focused on static images. But the team says that using it for videos is in the foreseeable future and would open up many filmmaking applications.
“Instead of needing an expert editor to spend several minutes tweaking an image frame-by-frame and pixel-by-pixel, we’d like to make the process simpler and faster so that image-editing can be more accessible to casual users,” says Aksoy. “The vision is to get to a point where it just takes a single click for editors to combine images to create these full-blown, realistic fantasy worlds.”
Aksoy says that, in its current iteration, SSS could be used by social platforms like Instagram and Snapchat to make their filters more realistic, particularly for changing the backgrounds on selfies or simulating specific kinds of cameras. In the future, the researchers plan to work to shorten the time it takes to compute an image from minutes to seconds, and to make images even more realistic by improving the system’s ability to match colors and handle things like illumination and shadows.
The paper was co-written by MIT associate professor Wojciech Matusik and CSAIL postdoctoral associate Tae-Hyun Oh, as well as Sylvain Paris of Adobe Research and Marc Pollefeys of ETH Zurich and Microsoft.