Back to Events

Seminar Series

November 07

Add to Calendar 2023-11-07 17:00:00 2023-11-07 17:30:00 America/New_York The Journey, not the Destination: How Data Guides Diffusion Models Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data—that is, identifying specific training examples which caused an image to be generated—remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing such attributions efficiently by leveraging recent work on data attribution in the supervised setting. Finally, we apply our method to find (and evaluate) such attributions for diffusion models trained on CIFAR-10 and MS COCO.Speaker bio: Josh is a second year PhD student working with Aleksander Madry. Josh's research focuses on building machine learning models that are safe and robust when deployed in the real world. Room 32-G882 (Hewlett Room)

November 03

Add to Calendar 2023-11-03 16:00:00 2023-11-03 16:30:00 America/New_York Operator SVD with Neural Networks via Nested Low-Rank Approximation Abstract: Top-$L$ eigenvalue decomposition (EVD) of a given linear operator, or finding its top-$L$ eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific simulation problems.For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques.While several optimization frameworks have been proposed in this parametric approach, all the existing proposals either use an ad-hoc regularization to obtain orthogonal eigenfunctions and/or inherently suffer with biased gradient estimates.In this talk, I will present a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition (SVD), accompanied with a technique called nesting for correctly learning the top-$L$ singular- value and functions up to degeneracy. Top-$L$ EVD can be performed as a special case. The proposed optimization framework is easy to implement with off-the-shelf gradient-based optimization algorithms, since (1) it is based on an unconstrained optimization problem that naturally admits an unbiased gradient estimator, and (2) it works without any extra orthonormalization steps and regularization terms. The proposed optimization framework can be used in a variety of application scenarios, and I will briefly discuss its application in machine learning and computational physics.Speaker bio: Jongha (Jon) Ryu is a postdoctoral associate at Research Laboratory of Electronics (RLE) hosted by Prof. Gregory W. Wornell. His research in general aims to develop efficient, reliable, and robust machine learning algorithms with provable performance guarantees, especially with inspirations from information theory. He is currently interested in representation learning, generative models, and learning with uncertainty. Room 32-G882 (Hewlett Room)

October 27

Add to Calendar 2023-10-27 16:00:00 2023-10-27 16:30:00 America/New_York Semantics and Learning for Active Robot Perception in Dynamic Environments Abstract: The ability to autonomously explore and model an unknown and changing environment is a fundamental capability for robot autonomy, and a prerequisite for numerous applications in industrial, construction, household, service, and assistive robotics. This talk explores how various forms of scene understanding, ranging from traditional geometry, end-to-end learning, semantic perception, and abstraction, can enable robots to actively reconstruct an unknown environment, detect and understand dynamic entities, and leverage prediction and adaptation for improved task performance in changing scenes. The presented methods are validated running on-board fully autonomous robots and the code is released as open source.Speaker bio: Lukas Schmid is a postdoctoral fellow working with Luca Carlone at the MIT-SPARK Lab. Before that, he briefly was a postdoctoral researcher at the Autonomous Systems Lab lead by Prof. Roland Siegwart at ETH Zürich, Switzerland, where he also obtained his PhD in 2022 and M.Sc. in Robotics, Systems, and Control in 2019. Among others, his work was honored with the Willi Studer Prize for the best M.Sc. graduate, the ETH Medal for outstanding master theses, and a Swiss National Science Foundation postdoctoral fellowship. 32-G882 (Hewlett Room)

October 20

Add to Calendar 2023-10-20 16:00:00 2023-10-20 16:30:00 America/New_York Feature Geometry and Multivariate Dependence Learning Abstract: In this talk, we present a geometric framework for learning and processing information from data. First, we introduce the feature geometry, which unifies statistical dependence and features in functional space equipped with geometric structures. Then, we formulate each learning problem as solving the optimal feature representation of the associated dependence component. Specifically, we will demonstrate deep neural networks as one specific method for achieving this goal. Building on this observation, we will propose more adaptable ways to design neural networks for multivariate learning tasks. We will discuss several learning applications, including (1) handling multimodal data with missing modalities and (2) learning dependence structures from sequential data.[Based on https://arxiv.org/abs/2309.10140]Speaker bio: Xiangxiang Xu is a postdoctoral associate in the Department of EECS at MIT, hosted by Prof. Lizhong Zheng. His research focuses on information theory and statistical learning, with applications in understanding and developing learning algorithms. 32-370

October 13

Add to Calendar 2023-10-13 16:00:00 2023-10-13 16:30:00 America/New_York The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms Abstract: The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade, the analysis of optimistic algorithms, including Optimistic Least Squares (OLS), in the context of rich reward function classes has relied on the concept of eluder dimension, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like OLS by establishing a link between regret and the dissimilarity dimension. To illustrate this, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime, advancing our understanding of these concepts.Speaker bio: Aldo Pacchiano is a postdoctoral researcher Fellow at the Eric and Wendy Schmidt Center of the broad institute of MIT and Harvard. He obtained his PhD under the supervision of Profs. Michael Jordan and Peter Bartlett at UC Berkeley and was a Postdoctoral Researcher at Microsoft Research, NYC. He will join the Boston University Center for Computing and Data Sciences as an assistant professor in the summer of 2024. His research lies in the areas of Reinforcement Learning, Online Learning, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific, engineering, and large-scale societal applications. 32-370

October 06

Add to Calendar 2023-10-06 16:00:00 2023-10-06 16:30:00 America/New_York On counterfactual inference with unobserved confounding Abstract: Given an observational study with n independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one p-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning n exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all n samples to jointly learn all n parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are s-sparse linear combination of k known vectors, the error is O(s log k/p). En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing unobserved confounders.Speaker bio: Abhin Shah is a sixth-year Ph.D. student advised by Prof. Devavrat Shah and Prof. Greg Wornell. He is a recipient of MIT’s Jacobs Presidential Fellowship. His research interests include theoretical and applied aspects of trustworthy machine learning with a focus on causality and fairness. Room 32-370

September 29

Add to Calendar 2023-09-29 15:00:00 2023-09-29 15:30:00 America/New_York Linear Attention Is (maybe) All You Need (to understand Transformer optimization) Abstract: Transformer training is notoriously difficult; requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that the linearized models mimic several prominent aspects of transformers vis-a-vis their training dynamics. Consequently, the results of this paper hold the promise of identifying a simple transformer model that might be a valuable, realistic proxy for understanding transformers.Speaker bio: Kwangjun Ahn is a final year PhD student at MIT with the Department of EECS (Electrical Engineering & Computer Science) and Laboratory for Information and Decision Systems (LIDS). His advisors are Profs. Suvrit Sra and Ali Jadbabaie. He's also working part time at Google Research, where he's working on accelerating LLM inference with the Speech & Language Algorithms Team. His current research interests include understanding LLM optimization and how to speed up the optimization. He has worked on various topics over the years, including machine learning theory, optimization, statistics, and learning for control. Room 32-882(Hewlett)

September 22

Add to Calendar 2023-09-22 16:00:00 2023-09-22 16:30:00 America/New_York Machine learning of model errors in dynamical systems The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. Here, we present a unifying framework for blending mechanistic and machine-learning approaches for identifying dynamical systems from data. This framework is agnostic to the chosen machine learning model parameterization, and casts the problem in both continuous- and discrete-time. We will also show recent developments that allow these methods to learn from noisy, partial observations. We first study model error from the learning theory perspective, defining the excess risk and generalization error. For a linear model of the error used to learn about ergodic dynamical systems, both excess risk and generalization error are bounded by terms that diminish with the square-root of T (the length of the training trajectory data). In our numerical examples, we first study an idealized, fully-observed Lorenz system with model error, and demonstrate that hybrid methods substantially outperform solely data-driven and solely mechanistic-approaches. Then, we present recent results for modeling partially observed Lorenz dynamics that leverages both data assimilation and neural differential equations. Joint work with Andrew Stuart. Room 32-370

September 15

Add to Calendar 2023-09-15 16:30:00 2023-09-15 17:00:00 America/New_York Large-Scale Study of Temporal Shift in Health Insurance Claims Abstract: Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.Bio: Christina Ji is a 5th year PhD student in the clinical ML group advised by David Sontag. Her research is on detecting and addressing distribution shift over time in healthcare settings. She has also worked on characterizing variation in treatment policies with causal inference methods and evaluating reinforcement learning policies. 32-G882