Back to Events

Seminar Series

October 21

Add to Calendar 2024-10-21 16:00:00 2024-10-21 17:00:00 America/New_York Objective Approaches in a Subjective Medical World Abstract: In today’s healthcare system, patients often feel disconnected from clinical professionals and their care journey. They receive a “one-size-fits-all” plan and are left out of the decision-making process, which can lead to a less satisfying experience. My research focuses on applying advanced AI technologies, including large language models, machine learning, and IoT, to address challenges in healthcare, particularly in patient-centered healthcare delivery. I aim to enhance the accuracy and efficiency of healthcare systems by using these "objective approaches" to navigate the subjective aspects of medical practice, such as clinician notes and patient preferences found in electronic health records. A key aspect of my work is improving the transparency of AI-based healthcare applications, making them more understandable and trustworthy for both clinicians and patients, by addressing critical issues such as building trust in AI systems and ensuring these technologies effectively meet the needs of patients and healthcare providers. Additionally, I emphasize the importance of personalizing healthcare by considering each patient's unique circumstances, including their preferences and socio-economic conditions. This research applies AI across various areas, from specific diseases like cancer to broader healthcare contexts, with the goal of improving both the delivery and experience of healthcare. My work contributes to the development of AI tools that not only enhance clinical decision-making but also foster better human-AI interaction, ultimately leading to improved healthcare outcomes. 32-G882

October 16

October 07

Add to Calendar 2024-10-07 16:00:00 2024-10-07 16:30:00 America/New_York Contextualizing Self-Supervised Learning: A New Path Ahead Abstract: Self-supervised learning (SSL) has achieved remarkable progress over the years, particularly in visual domains. However, recent advancements have plateaued due to performance bottlenecks, and more focus has shifted towards generative models. In this talk, we step back to analyze existing SSL paradigms and identify the lack of context as their most critical obstacle. To address this, we explore two approaches that incorporate contextual knowledge into SSL: 1. Contextual Self-Supervised Learning: Here, learned representations adapt their inductive biases to diverse contexts, enhancing the flexibility and generality of SSL. 2. Self-Correction: This method allows foundation models to refine themselves by reflecting on their own predictions within a dynamically evolving context.These insights illustrate new paths to craft self-supervision and highlight context as a key ingredient for building general-purpose SSL.Paper Links: * In-Context Symmetries: Self-Supervised Learning through Contextual World Models (https://arxiv.org/pdf/2405.18193) * A Theoretical Understanding of Self-Correction through In-context Alignment (https://arxiv.org/pdf/2405.18634)Both papers to be covered in this talk were accepted to NeurIPS 2024. The theoretical work on understanding self-correction received the Spotlight Award at the ICML 2024 ICL Workshop.Bio: Yifei Wang is a postdoc at CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. Yifei is generally interested in machine learning and representation learning, with a focus on bridging the theory and practice of self-supervised learning. His first-author works have been recognized by multiple best paper awards, including the Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Spotlight Award at the ICML 2024 ICL Workshop. 32-G882 (Hewlett Room)

September 23

September 16

Add to Calendar 2024-09-16 16:00:00 2024-09-16 16:30:00 America/New_York Multi-sensory perception from top to down Abstract: Human sensory experiences, such as vision, hearing, touch, and smell, serve as natural interfaces for perceiving and reasoning about the world around us. Understanding 3D environments is crucial for applications like video processing, robotics, and augmented reality. This work explores how material properties and microgeometry can be learned through cross-modal associations between sight, sound, and touch. I will introduce a method that leverages in-the-wild online videos to study interactable audio generation via dense visual cues. Additionally, I will share recent advancements in multimodal scene understanding and discuss future directions for the field.Bio: Anna is a senior undergraduate in Tsinghua University. Her previous research lies in multi-modal perception, from the perspective of audio and vision. She is an intern in Jim Glass's group. 32-G882, Hewlett Room

May 02

Add to Calendar 2024-05-02 16:00:00 2024-05-02 16:30:00 America/New_York Decomposing Predictions by Modeling Model Computation Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. Paper: https://arxiv.org/abs/2404.11534Blog post: https://gradientscience.org/modelcomponents/Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations. Room 32-G449 (Patil/Kiva)

April 25

Add to Calendar 2024-04-25 16:00:00 2024-04-25 16:30:00 America/New_York ML-Tea: Ablation Based Counterfactuals Abstract: The widespread adoption of diffusion models for creative uses such as image, video, and audio synthesis has raised serious questions surrounding the use of training data and its regulation. To arrive at a resolution, it is important to understand how such models are influenced by their training data. Due to the complexity involved in training and sampling from these models, the ultimate impact of the training data is challenging to characterize, confounding regulatory and scientific efforts. In this work we explore the idea of an Ablation Based Counterfactual, which allows us to compute counterfactual scenarios where training data is missing by ablating parts of a model, circumventing the need to retrain. This enables important downstream tasks such as data attribution, and brings us closer to understanding the influence of training data on these models. 32-370

April 18

Add to Calendar 2024-04-18 16:00:00 2024-04-18 16:30:00 America/New_York Improving data efficiency and accessibility for general robotic manipulation Abstract: How can data-driven approaches endow robots with diverse manipulative skills and robust performance in unstructured environments? Despite recent progress, many open questions remain in this area, such as: (1) How can we define and model the data distribution for robotic systems? (2) In light of data scarcity, what strategies can algorithms employ to enhance performance? (3) What is the best way to scale up robotic data collection? In this talk, Hao-Shu Fang will share his research on enhancing the efficiency of robot learning algorithms and democratizing access to large-scale robotic manipulation data. He will also discuss several open questions in data-driven robotic manipulation, offering insights to the challenges posed.Bio: Hao-Shu Fang is a postdoctoral researcher collaborating with Pulkit Agrawal and Edward Adelson. His research focuses on general robotic manipulation. Recently, he has been investigating how to integrate visual-tactile perception for improved manipulation and how to train a multi-task robotic foundation behavioral model. Room 32-370

April 11

Add to Calendar 2024-04-11 16:00:00 2024-04-11 16:30:00 America/New_York Removing Biases from Molecular Representations via Information Maximization Abstract: High-throughput drug screening – using cell imaging or gene expression measurements as readouts of drug effect – is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE’s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.Bio: I am a second-year PhD student at MIT EECS, advised by Tommi Jaakkola and Caroline Uhler. I am also affiliated with Eric and Wendy Schmidt Center (EWSC) at Broad Institute. My research interests lie broadly in machine learning, representation learning, and AI for science. Recently my research focuses on multi-modal representation learning and perturbation modelling for drug discovery. Before my PhD, I obtained my Bachelor’s degree from Tsinghua University. Room 32-370

April 04

Add to Calendar 2024-04-04 16:00:00 2024-04-04 16:30:00 America/New_York Interpolating Item and User Fairness in Multi-Sided Recommendations Abstract: Today's online platforms rely heavily on algorithmic recommendations to bolster user engagement and drive revenue. However, such algorithmic recommendations can impact diverse stakeholders involved, namely the platform, items (seller), and users (customers), each with their unique objectives. In such multi-sided platforms, finding an appropriate middle ground becomes a complex operational challenge. Motivated by this, we formulate a novel fair recommendation framework, called Problem (FAIR), that not only maximizes the platform's revenue, but also accommodates varying fairness considerations from the perspectives of items and users. Our framework's distinguishing trait lies in its flexibility - it allows the platform to specify any definitions of item/user fairness that are deemed appropriate, as well as decide the "price of fairness" it is willing to pay to ensure fairness for other stakeholders. We further examine Problem (FAIR) in a dynamic online setting, where the platform needs to learn user data and generate fair recommendations simultaneously in real time, which are two tasks that are often at odds. In face of this additional challenge, we devise a low-regret online recommendation algorithm, called FORM, that effectively balances the act of learning and performing fair recommendation. Our theoretical analysis confirms that FORM proficiently maintains the platform's revenue, while ensuring desired levels of fairness for both items and users. Finally, we demonstrate the efficacy of our framework and method via several case studies on real-world data.Bio: Qinyi Chen is a fourth-year PhD student in the Operations Research Center (ORC) at MIT, advised by Prof. Negin Golrezaei. Her research interests span machine learning and optimization, AI/ML fairness, approximation algorithms, game and auction theory, with applications in digital platforms and marketplaces. 32-370

March 21

Add to Calendar 2024-03-21 16:00:00 2024-03-21 16:30:00 America/New_York When is Agnostic Reinforcement Learning Statistically Tractable? Abstract: We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Π, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an ε-suboptimal policy with respect to Π? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Π and is independent of the MDP dynamics. With a generative model, we show that for any policy class Π, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class Π with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration. 32-370

March 14

Add to Calendar 2024-03-14 16:00:00 2024-03-14 16:30:00 America/New_York What's the Erdős number of an LLM? Mathematical and algorithmic discovery via machine learning Abstract: We survey methods for discovering novel mathematics and novel algorithms via machine learning (AlphaTensor, FunSearch, AlphaGeometry, AI Feynman etc.). We won't present our own work but rather other people's works. So, this is a review in form a presentation. 32-370

March 07

Add to Calendar 2024-03-07 16:00:00 2024-03-07 16:30:00 America/New_York Human Expertise in Algorithmic Prediction Abstract: We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which ‘look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.Speaker Bio: Rohan is a second year PhD student in EECS, where he is advised by Manish Raghavan and Devavrat Shah. His research interests are at the intersection of machine learning and economics, with a particular focus on causal inference, human/AI collaboration and data-driven decision making. Room 32-370 and on Zoom

February 29

Context is Environment

Sharut Gupta
MIT CSAIL

Part Of

Add to Calendar 2024-02-29 17:00:00 2024-02-29 17:30:00 America/New_York Context is Environment Abstract: Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the hard lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context--unlabeled examples as they arrive--allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.Speaker Bio: Sharut Gupta is a second-year Ph.D. student at MIT CSAIL, working with Prof. Stefanie Jegelka. Her research mainly focuses on building robust and generalizable machine learning systems under minimal supervision. She enjoys working on out-of-distribution generalization, self-supervised learning, causal inference, and representation learning. Room 32-G449 (Patil/Kiva Seminar Room)

February 22

Add to Calendar 2024-02-22 17:00:00 2024-02-22 17:30:00 America/New_York Ask Your Distribution Shift if Pre-Training is Right for You Abstract: Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training can and cannot address. In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e.g., they cannot generalize to a different domain) and biases in the training data (e.g., they rely on spurious features). Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases. After providing theoretical motivation and empirical evidence for this finding, we explore two of its implications for developing robust models: (1) pre-training and interventions designed to prevent exploiting biases have complementary robust- ness benefits, and (2) fine-tuning on a (very) small, non-diverse but de-biased dataset can result in significantly more robust models than fine-tuning on a large and diverse but biased dataset.Speaker bio: Ben is a second year PhD student at MIT where he is advised by Aleksander Madry. He is interested in how we can develop machine learning models that can be safely deployed, with a focus on robustness to distribution shifts. Lately, he has been working on understanding how we can harness large-scale pre-training (e.g., CLIP, GPT) to develop robust task-specific models.

February 15

Efficiently Searching for Distributions

Sandeep Silwal
CSAIL MIT

Part Of

Add to Calendar 2024-02-15 16:00:00 2024-02-15 16:30:00 America/New_York Efficiently Searching for Distributions Abstract: How efficiently can we search distributions? The problem is modeled as follows: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,...,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over [n]. The goal is to output the closest distribution to p among the v_i's in TV distance (up to some small additive error). State of the art sample efficient algorithms require Theta(log k) samples and run in near linear time.We introduce a fresh perspective on the problem and ask if we can output the closest distribution in *sublinear* time. This question is particularly motivated as it is a generalization of the traditional nearest neighbor search problem: if we take enough samples, we can learn p explicitly up to low TV distance, and then find the closest v_i in o(k) time using standard nearest neighbor search. However, this approach requires Omega(n) samples. Thus, it is natural to ask: can we obtain both sublinear number of samples and sublinear query time? We present some nice progress towards this question and uncover a very interesting statistical-computational trade-off.This is joint work with Anders Aamand, Alex Andoni, Justin Chen, Piotr Indyk, Shyam Narayanan, and Haike Xu.Bio: Sandeep is a final year PhD student at MIT, advised by Piotr Indyk. His interests are broadly in fast algorithm design. Recently, he has been working in the intersection of machine learning and classical algorithms by designing provable algorithms in various ML settings, such as efficient algorithms for processing large datasets, as well as using ML to inspire algorithm design.

December 08

Add to Calendar 2023-12-08 16:00:00 2023-12-08 16:30:00 America/New_York Learning to Assess Disease and Health In Your Home Abstract: The future of healthcare lies in delivering comprehensive medical services to patients in their own homes. As the global population ages and chronic diseases become increasingly prevalent, objective, longitudinal and reliable health and disease assessment at home becomes crucial for early detection and prevention of hospitalization. In this talk, I will present new learning methods with everyday devices for in-home healthcare. I will first describe a simple self-supervised framework for remote human vitals sensing just using daily smartphones. I will then introduce an AI-powered digital biomarker for Parkinson’s disease that detects the disease, estimates its severity, and tracks its progression using nocturnal breathing signals. They showcase the potential of AI-based in-home assessment for various diseases and human health sensing, enabling remote monitoring of health-related conditions, timely care and enhancing patient outcomes.Speaker bio: Yuzhe Yang is a PhD candidate in computer science at MIT. He received his B.S. with honors in EECS from Peking University. His research interests include machine learning, and AI for human disease, health and medicine. His works on AI-enabled biomarkers for Parkinson’s disease were named as Ten Notable Advances in 2022 by Nature Medicine, and Ten Crucial Advances in Movement Disorders in 2022 by The Lancet Neurology. His research has been published in Nature Medicine, Science Translational Medicine, NeurIPS, ICML, ICLR, CVPR, and UbiComp. His works have been recognized by the MathWorks Fellowship, Takeda Fellowship, Baidu PhD Scholarship, and media coverage from MIT Tech Review, Wall Street Journal, Forbes, BBC, The Washington Post, etc. Room 32-G882 (Hewlett Room)

December 01

Add to Calendar 2023-12-01 16:00:00 2023-12-01 16:30:00 America/New_York Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering Abstract: We investigate the camera pose estimation problem in the context of 2D/3D medical image registration. The application is to align 2D intraoperative images (e.g., X-ray) to a patient's 3D preoperative volume (e.g., CT), helping provide 3D image guidance during minimally invasive surgeries. We present a patient-specific self-supervised approach that uses differentiable rendering to achieve the sub-millimeter accuracy required in this context. Some of aspects of our work that may be of interest to the broader ML community include- How do you exactly compute the rendering equation for differentiable ray tracing through a voxel grid?- What is the optimal representation of rotations and translations when using gradient descent to optimize poses?- What is the optimal image loss function that achieves robust image registration while still being fast enough to use in real time?Speaker bio: Vivek is a 3rd year PhD student in Polina Golland's group broadly interested in 3D computer vision problems across science and medicine. 32-G882 (Hewlett Room)

November 17

Add to Calendar 2023-11-17 16:00:00 2023-11-17 16:30:00 America/New_York A Game-Theoretic Perspective on Trustworthy Algorithms Abstract: Many algorithms are trained on data provided by humans, such as those that power recommender systems and hiring decision aids. Most data-driven algorithms assume that user behavior is exogenous: a user would react a given prompt (e.g., a recommendation or hiring suggestion) in the same way no matter what algorithm generated it. For example, algorithms that rely on an i.i.d. assumption inherently assume exogeneity. In practice, user behavior is not exogenous---users are *strategic*. For example, there are documented cases of TikTok users changing their scrolling behavior after realizing that the TikTok algorithm pays attention to dwell time, and Uber drivers changing how they accept and cancel rides based on Uber's matching algorithm. What are the implications of breaking the exogeneity assumption? We answer this question in our work, modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We leverage results from misspecified learning to characterize the effect of strategization on data-driven algorithms. As one of our main contributions, we find that designing trustworthy algorithms can go hand in hand with accurate estimation. That is, there is not necessarily a trade-off between performance and trustworthiness. We provide a formalization of trustworthiness that inspires potential interventions. 32-G882 (Hewlett Room)