We examine the efficacy of various approximate inference methods for learning probabilistic models.

As we gather more data about the world, machines are able to perform increasingly complicated tasks in far-reaching fields, from making inferences about cancer or synthesizing pseudo-realistic images. This change is driven not only by an increase in the amount of data we have, but also our use of increasingly rich representations, or models, of the world.
Unfortunately, for large datasets or moderately complex models, exact probabilistic inference is either overly time-consuming or mathematically impossible. Thus, in practice, some form of approximate inference is required. Variational Bayes (VB) is one such method, which has proven to be computationally fast on large datasets. However, it is known to sometimes produce inaccurate results in a way that may be compounded in complex models. At its core, VB defines a simpler, approximate model and then minimizes a measure of dissimilarity -- the Kullback-Leibler divergence -- between the approximate and exact model. We consider using alternative dissimilarity measures and seek to understand whether these measures may have superior statistical or computational properties for specific machine learning problems.