Project

Scalable Bayesian Inference via Adaptive Data Summaries

To obtain scalable Bayesian inference methods, we develop algorithms to create compact “summaries” of large quantities of data. We can then quickly run standard inference algorithms on these summaries without needing to look at the whole dataset.

The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. However, standard Bayesian inference algorithms are computationally expensive, making their direct application to large datasets difficult or infeasible. In certain models (known as exponential families) we can use sufficient statistics to summarize arbitrary amounts of data with a just a finite set of numbers. However, most models do not admit a finite number of sufficient statistics, so we must remember all of the data. Because we need to reference all the data, inference becomes very slow as datasets grow large. In this project we are developing algorithms to approximately summarize a dataset in a model-specific way. These compact data summaries can then be used in place of the full dataset when performing Bayesian inference, leading to substantial gains in computational efficiency while only decreasing the accuracy of inferences by a known amount. We are applying our methods to a range of models from generalized linear models to Bayesian nonparametric ones.

Group

Machine Learning

Communities

Vertical AI Community of Research

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Apr 24 '20

Project

Scalable Bayesian Inference via Adaptive Data Summaries

Group

Communities

Contact us

Group

Communities

Members

Tamara Broderick