Causal modelling with neural and kernel feature embeddings: treatment effects, counterfactuals, mediation, and proxies
Host
Stefanie Jegelka
MIT CSAIL
Abstract:
A fundamental causal modelling task is to predict the effect of an intervention (or treatment) D=d on outcome Y in the presence of observed covariates X. We can obtain an average treatment effect by marginalising our estimate \gamma(X,D) of the conditional mean E(Y|X,D) over P(X). More complex causal questions require taking conditional expectations. For instance, the average treatment on the treated (ATT) addresses a counterfactual: what is the outcome of an intervention d' on a subpopulation that received treatment d? In this case, we must marginalise \gamma over the conditional distribution P(X\d), which becomes challenging for continuous multivariate d. Many additional causal questions require us to marginalise over conditional distributions, including Conditional ATE, mediation analysis, dynamic treatment effects, and correction for unobserved confounders using proxy variables.
We address these questions in the nonparametric setting using mean embeddings, which represent distributions as expectations of neural network features (adaptive) or kernel features (fixed, infinite dimensional). These apply for very general treatments D and covariates X. We perform marginalization over conditional distributions using conditional mean embeddings, in a generalization of two-stage least-squares regression. We provide strong statistical guarantees under general smoothness assumptions, and straightforward, robust implementations for both NN and kernel features. The method is demonstrated on synthetic examples, and on causal modelling questions arising from the US Job Corps program for Disadvantaged Youth.
Bio:
Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning (CSML) at UCL. His recent research interests include causal inference and representation learning, design and training of generative models, and nonparametric hypothesis testing.
Arthur has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence, an Action Editor for JMLR, a Senior Area Chair for NeurIPS (2018,2021) and ICML (2022), a member of the COLT Program Committee in 2013, and a member of Royal Statistical Society Research Section Committee since January 2020. Arthur was program co-chair for AISTATS in 2016, tutorials co-chair for ICML 2018, workshops co-chair for ICML 2019, program co-chair for the Dali workshop in 2019, and co-organsier of the Machine Learning Summer School 2019 in London.
A fundamental causal modelling task is to predict the effect of an intervention (or treatment) D=d on outcome Y in the presence of observed covariates X. We can obtain an average treatment effect by marginalising our estimate \gamma(X,D) of the conditional mean E(Y|X,D) over P(X). More complex causal questions require taking conditional expectations. For instance, the average treatment on the treated (ATT) addresses a counterfactual: what is the outcome of an intervention d' on a subpopulation that received treatment d? In this case, we must marginalise \gamma over the conditional distribution P(X\d), which becomes challenging for continuous multivariate d. Many additional causal questions require us to marginalise over conditional distributions, including Conditional ATE, mediation analysis, dynamic treatment effects, and correction for unobserved confounders using proxy variables.
We address these questions in the nonparametric setting using mean embeddings, which represent distributions as expectations of neural network features (adaptive) or kernel features (fixed, infinite dimensional). These apply for very general treatments D and covariates X. We perform marginalization over conditional distributions using conditional mean embeddings, in a generalization of two-stage least-squares regression. We provide strong statistical guarantees under general smoothness assumptions, and straightforward, robust implementations for both NN and kernel features. The method is demonstrated on synthetic examples, and on causal modelling questions arising from the US Job Corps program for Disadvantaged Youth.
Bio:
Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning (CSML) at UCL. His recent research interests include causal inference and representation learning, design and training of generative models, and nonparametric hypothesis testing.
Arthur has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence, an Action Editor for JMLR, a Senior Area Chair for NeurIPS (2018,2021) and ICML (2022), a member of the COLT Program Committee in 2013, and a member of Royal Statistical Society Research Section Committee since January 2020. Arthur was program co-chair for AISTATS in 2016, tutorials co-chair for ICML 2018, workshops co-chair for ICML 2019, program co-chair for the Dali workshop in 2019, and co-organsier of the Machine Learning Summer School 2019 in London.