Visual recognition is affected by changes in visual appearance (pose, scale, rotation) that do not affect a scene’s semantic category. We introduce a method for using sets of traNational Science Foundationorming examples, to learn representations that are robust to these changes.

The sample complexity of a learning task is increased by the presence of transformations in the input space that preserve class identity. Visual object recognition for example, i.e. the discrimination or categorization of distinct semantic classes, is affected by changes in viewpoint, scale, illumination or planar transformations. While drastically altering the visual appearance of a scene, these changes do not shift the semantic category. We introduce a framework for using sets of transforming examples, or orbits, in order to learn representations that are robust and selective. We train deep encoders that explicitly account for the equivalence up to transformations of orbit sets and show that the resulting encodings contract the intra-orbit distance and preserve identity either via reconstruction or via increasing the inter-orbit distance. We explore a loss function that combines a discriminative term, adapted to orbits, and a reconstruction term that uses a decoder-encoder map to learn to rectify transformation-perturbed examples. We provide examples of such transformation orbits and their use for obtaining deep representations. We demonstrate the validity of the learned embeddings for one-shot learning on a dataset with simple, geometric transformations, as well as face verification and retrieval in the presence of more complex visual variability. Our results suggest that a suitable definition of the orbit sets is a form of weak supervision that can be exploited to learn semantically relevant embeddings.

Research Areas