Back to Events

Seminar Series

September 29

Add to Calendar 2023-09-29 15:00:00 2023-09-29 15:30:00 America/New_York Linear Attention Is (maybe) All You Need (to understand Transformer optimization) Abstract: Transformer training is notoriously difficult; requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that the linearized models mimic several prominent aspects of transformers vis-a-vis their training dynamics. Consequently, the results of this paper hold the promise of identifying a simple transformer model that might be a valuable, realistic proxy for understanding transformers.Speaker bio: Kwangjun Ahn is a final year PhD student at MIT with the Department of EECS (Electrical Engineering & Computer Science) and Laboratory for Information and Decision Systems (LIDS). His advisors are Profs. Suvrit Sra and Ali Jadbabaie. He's also working part time at Google Research, where he's working on accelerating LLM inference with the Speech & Language Algorithms Team. His current research interests include understanding LLM optimization and how to speed up the optimization. He has worked on various topics over the years, including machine learning theory, optimization, statistics, and learning for control. Room 32-882(Hewlett)

September 22

Add to Calendar 2023-09-22 16:00:00 2023-09-22 16:30:00 America/New_York Machine learning of model errors in dynamical systems The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. Here, we present a unifying framework for blending mechanistic and machine-learning approaches for identifying dynamical systems from data. This framework is agnostic to the chosen machine learning model parameterization, and casts the problem in both continuous- and discrete-time. We will also show recent developments that allow these methods to learn from noisy, partial observations. We first study model error from the learning theory perspective, defining the excess risk and generalization error. For a linear model of the error used to learn about ergodic dynamical systems, both excess risk and generalization error are bounded by terms that diminish with the square-root of T (the length of the training trajectory data). In our numerical examples, we first study an idealized, fully-observed Lorenz system with model error, and demonstrate that hybrid methods substantially outperform solely data-driven and solely mechanistic-approaches. Then, we present recent results for modeling partially observed Lorenz dynamics that leverages both data assimilation and neural differential equations. Joint work with Andrew Stuart. Room 32-370

September 15

Add to Calendar 2023-09-15 16:30:00 2023-09-15 17:00:00 America/New_York Large-Scale Study of Temporal Shift in Health Insurance Claims Abstract: Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.Bio: Christina Ji is a 5th year PhD student in the clinical ML group advised by David Sontag. Her research is on detecting and addressing distribution shift over time in healthcare settings. She has also worked on characterizing variation in treatment policies with causal inference methods and evaluating reinforcement learning policies. 32-G882