The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. However, standard Bayesian inference algorithms are computationally expensive, making their direct application to large datasets difficult or infeasible. In certain models (known as exponential families) we can use sufficient statistics to summarize arbitrary amounts of data with a just a finite set of numbers. However, most models do not admit a finite number of sufficient statistics, so we must remember all of the data. Because we need to reference all the data, inference becomes very slow as datasets grow large. In this project we are developing algorithms to approximately summarize a dataset in a model-specific way. These compact data summaries can then be used in place of the full dataset when performing Bayesian inference, leading to substantial gains in computational efficiency while only decreasing the accuracy of inferences by a known amount. We are applying our methods to a range of models from generalized linear models to Bayesian nonparametric ones.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.