[Thesis Defense] Estimation, Prediction and Counterfactual Inference with Dependent Observations
Speaker
A common assumption underlying a variety of applications in data science is that observations are generated independently from a high-dimensional distribution. However, this assumption is put into question in many settings ranging from econometrics to phylogenetics and image analysis, where data exhibit different types of spatial, temporal or network dependence.
In this talk, I will describe methods for supervised and unsupervised learning, as well as counterfactual inference in settings with dependent data. In the context of unsupervised learning, we assume that we are given access to one or multiple samples from a graphical model, either discrete (Ising model) or continuous (Gaussian Graphical Model). We study the cases of fully observable as well as latent variable models and design polynomial time algorithms that either learn a parametric description of the distribution, or estimate it non-parametrically. In the context of supervised learning, we assume access to a sequence of feature-label pairs, where the binary labels are jointly sampled from an Ising model that depends on the features. We provide algorithms and statistically efficient estimation rates for the various parameters in this model. In the context of counterfactual inference, we propose a general approach for constructing experiment designs under network interference. This leads to the best known rates of estimation for several well-studied causal effects (e.g. the global and direct effects) but also provides new methods for effects which have received less attention in the literature.