We've developed an object-based neural network architecture for learning predictive models of intuitive physics that extrapolates to variable object count and variable scene configurations with only spatially and temporally local computation.

This project links two levels of factorization and composition in learning physical dynamics.

On the level of the physics program, the Neural Physics Engine (NPE) architecture explicitly reflects a causal structure in object interactions by factorizing object dynamics into pairwise interactions. As a predictive model of physical dynamics, the NPE models the future state of a single object as a function composition of the pairwise interactions between itself and other neighboring objects in the scene.

On the level of the physical scene, we factorize the scene into object-based representations, and compose smaller building blocks to form larger objects. This design allows the NPE to extrapolate to variable number of objects and variable scene configurations with only spatially and temporally local computation.

Our approach draws on the strengths of both symbolic and neural approaches: like a symbolic physics engine, the NPE is endowed with generic notions of objects and their interactions, but as a neural network it can also be trained via stochastic gradient descent to adapt to specific object properties and dynamics of different worlds.

It makes two strong, but natural, assumptions about a physical environment:

  1. there exist objects 
  2. these objects interact with each other in a factorized manner 

We evaluate the efficacy of our approach on simple rigid-body dynamics in two-dimensional worlds. By comparing to less structured architectures, we show that our model’s compositional representation of the structure in physical interactions improves its ability to predict movement, generalize to different numbers of objects, and infer latent properties of objects such as mass.