ML Tea | MIT CSAIL

Back to Events

Seminar Series

ML Tea

October 06

On counterfactual inference with unobserved confounding

Abhin Shah

CSAIL and LIDS

Part Of

ML Tea

4:00P

- 4:30P

Location

Room 32-370

Add to Calendar 2023-10-06 16:00:00 2023-10-06 16:30:00 America/New_York On counterfactual inference with unobserved confounding Abstract: Given an observational study with n independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one p-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning n exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all n samples to jointly learn all n parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are s-sparse linear combination of k known vectors, the error is O(s log k/p). En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing unobserved confounders.Speaker bio: Abhin Shah is a sixth-year Ph.D. student advised by Prof. Devavrat Shah and Prof. Greg Wornell. He is a recipient of MIT’s Jacobs Presidential Fellowship. His research interests include theoretical and applied aspects of trustworthy machine learning with a focus on causality and fairness. Room 32-370

September 29

Linear Attention Is (maybe) All You Need (to understand Transformer optimization)

Kwangjun Ahn

LIDS & EECS

Part Of

ML Tea

3:00P

- 3:30P

Location

Room 32-882(Hewlett)

Add to Calendar 2023-09-29 15:00:00 2023-09-29 15:30:00 America/New_York Linear Attention Is (maybe) All You Need (to understand Transformer optimization) Abstract: Transformer training is notoriously difficult; requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that the linearized models mimic several prominent aspects of transformers vis-a-vis their training dynamics. Consequently, the results of this paper hold the promise of identifying a simple transformer model that might be a valuable, realistic proxy for understanding transformers.Speaker bio: Kwangjun Ahn is a final year PhD student at MIT with the Department of EECS (Electrical Engineering & Computer Science) and Laboratory for Information and Decision Systems (LIDS). His advisors are Profs. Suvrit Sra and Ali Jadbabaie. He's also working part time at Google Research, where he's working on accelerating LLM inference with the Speech & Language Algorithms Team. His current research interests include understanding LLM optimization and how to speed up the optimization. He has worked on various topics over the years, including machine learning theory, optimization, statistics, and learning for control. Room 32-882(Hewlett)

September 22

Machine learning of model errors in dynamical systems

Matthew Levine

Part Of

ML Tea

4:00P

- 4:30P

Location

Room 32-370

Add to Calendar 2023-09-22 16:00:00 2023-09-22 16:30:00 America/New_York Machine learning of model errors in dynamical systems The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. Here, we present a unifying framework for blending mechanistic and machine-learning approaches for identifying dynamical systems from data. This framework is agnostic to the chosen machine learning model parameterization, and casts the problem in both continuous- and discrete-time. We will also show recent developments that allow these methods to learn from noisy, partial observations. We first study model error from the learning theory perspective, defining the excess risk and generalization error. For a linear model of the error used to learn about ergodic dynamical systems, both excess risk and generalization error are bounded by terms that diminish with the square-root of T (the length of the training trajectory data). In our numerical examples, we first study an idealized, fully-observed Lorenz system with model error, and demonstrate that hybrid methods substantially outperform solely data-driven and solely mechanistic-approaches. Then, we present recent results for modeling partially observed Lorenz dynamics that leverages both data assimilation and neural differential equations. Joint work with Andrew Stuart. Room 32-370

September 15

Large-Scale Study of Temporal Shift in Health Insurance Claims

Christina Ji

CSAIL MIT

Part Of

ML Tea

4:30P

- 5:00P

Location

32-G882

Add to Calendar 2023-09-15 16:30:00 2023-09-15 17:00:00 America/New_York Large-Scale Study of Temporal Shift in Health Insurance Claims Abstract: Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.Bio: Christina Ji is a 5th year PhD student in the clinical ML group advised by David Sontag. Her research is on detecting and addressing distribution shift over time in healthcare settings. She has also worked on characterizing variation in treatment policies with causal inference methods and evaluating reinforcement learning policies. 32-G882