Add to Calendar
2025-04-28 16:00:00
2025-04-28 17:00:00
America/New_York
ML Tea: Evaluating Multiple Models Using Labeled and Unlabeled Data
Speakers: Shuvom SadhukaAbstract: It remains difficult to evaluate machine learning classifiers in the absence of a large, labeled dataset. While labeled data can be prohibitively expensive or impossible to obtain, unlabeled data is plentiful. Here, we introduce Semi-Supervised Model Evaluation (SSME), a method that uses both labeled and unlabeled data to evaluate machine learning classifiers. SSME is the first evaluation method to take advantage of the fact that: (i) there are frequently multiple classifiers for the same task, (ii) continuous classifier scores are often available for all classes, and (iii) unlabeled data is often far more plentiful than labeled data. The key idea is to use a semi-supervised mixture model to estimate the joint distribution of ground truth labels and classifier predictions. We can then use this model to estimate any metric that is a function of classifier scores and ground truth labels (e.g., accuracy or expected calibration error). We present experiments in four domains where obtaining large labeled datasets is often impractical: (1) healthcare, (2) content moderation, (3) molecular property prediction, and (4) image annotation. Our results demonstrate that SSME estimates performance more accurately than do competing methods, reducing error by 5.1× relative to using labeled data alone and 2.4× relative to the next best competing method. SSME also improves accuracy when evaluating performance across subsets of the test distribution (e.g., specific demographic subgroups) and when evaluating the performance of language models.Bio: Shuvom Sadhuka is a third-year PhD student in EECS, advised by Bonnie Berger. His research interests center on evaluation and uncertainty quantification, often with applications to biomedical data. In particular, he is interested in how to conduct evaluations of machine learning systems (both the data and models) along critical axes such as privacy and calibration in constrained settings (e.g., sparse or noisy labels). His PhD is supported by a Hertz Fellowship and NSF GRFP. Prior to MIT, he received an AB in Computer Science and Statistics from Harvard.
TBD
April 28
April 23
Add to Calendar
2025-04-23 16:00:00
2025-04-23 17:00:00
America/New_York
ML Tea: Do Large Language Model Benchmarks Test Reliability?
Speakers: Josh Vendrow and Eddie VendrowAbstract: When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior.Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle.Bios: Josh is a third-year PhD student working with Aleksander Madry. Josh's research focuses on building machine learning models that are safe and robust when deployed in the real world. Eddie is a second-year PhD student advised by Sara Beery and supported by the MIT Presidential Fellowship and NSF GRFP. Eddie is interested in bringing automation to scientific discovery, including by building systems and agents that can autonomously carry out scientific data collection, data science, and analysis.
TBD
April 14
Add to Calendar
2025-04-14 16:00:00
2025-04-14 17:00:00
America/New_York
ML Tea: Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Speakers: Tian Jin & Ellie ChengAbstract: Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teachers LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are the PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that allows LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21× to 1.93× with corresponding quality changes of +2.2% to -7.1%, measured as in length-controlled win rates.Bios: Tian Jin is a 5th-year Ph.D. student at MIT, advised by Michael Carbin and Jonathan Ragan-Kelley. His research focuses on machine learning and programming systems. Previously, Tian was a Research Engineer at IBM Research, where he led efforts to enable deep neural network inference on IBM mainframe machines and contributed to compiler support for the IBM Summit Supercomputer. He holds a dual degree in Computer Science and Mathematics from Haverford College.Ellie is a 3rd year PhD Student at CSAIL, advised by Michael Carbin. Her research interests are in the intersection of programming languages and machine learning.
TBD
April 07
Add to Calendar
2025-04-07 16:00:00
2025-04-07 17:00:00
America/New_York
ML Tea: Activation-Informed Merging of LLMs
Speaker: Kaveh AlimohammadiTitle: Activation-Informed Merging of LLMsAbstract: Model merging has emerged as an efficient strategy for combining multiple fine-tuned large language models (LLMs) while avoiding the computational overhead of retraining. However, existing methods often overlook the importance of activation-space information in guiding the merging process. In this talk, I will introduce Activation-Informed Merging (AIM), a novel technique that enhances the robustness and performance of merged models by incorporating activation-space insights. AIM is designed as a complementary framework that can be applied to any merging approach, preserving critical weights from the base model through principles drawn from continual learning and model compression. By utilizing a task-agnostic calibration set, AIM selectively prioritizes essential parameters, leading to significant performance improvements across multiple benchmarks, with up to a 40% increase in effectiveness.
TBD
March 17
Add to Calendar
2025-03-17 16:00:00
2025-03-17 16:45:00
America/New_York
ML Tea: Aggregating fMRI datasets for training brain-optimized models of human vision
Speaker: Benjamin LahnerTitle: Aggregating fMRI datasets for training brain-optimized models of human visionAbstract: Large-scale fMRI datasets are revolutionizing our understanding of the neural processes underlying human perception, driving new breakthroughs in neuroscience and computational modeling. Yet individual fMRI data collection efforts remain constrained by practical limitations in scan time, creating an inherent tradeoff between subjects, stimuli, and stimulus repetitions. This tradeoff often compromises stimuli diversity, data quality, and generalizability of findings such that even the largest fMRI datasets cannot fully leverage the power of high-parameter artificial neural network models and high-dimensional feature spaces. To overcome these challenges, we introduce MOSAIC (Meta-Organized Stimuli And fMRI Imaging data for Computational modeling): a scalable framework for aggregating fMRI responses across multiple subjects and datasets. We preprocessed and registered eight event-related fMRI vision datasets (Natural Scenes Dataset, Natural Object Dataset, BOLD Moments Dataset, BOLD5000, Human Actions Dataset, Deeprecon, Generic Object Decoding, and THINGS) to the fsLR32k cortical surface space with fMRIPrep to obtain 430,007 fMRI-stimulus pairs over 93 subjects and 162,839 unique stimuli. We estimated single-trial beta values with GLMsingle (Prince et al., 2022), obtaining parameter estimates of similar or higher quality than the originally published datasets. Critically, we curated the dataset by eliminating stimuli with perceptual similarity above a defined threshold to prevent test-train leakage. This rigorous pipeline resulted in a well-defined stimulus-response dataset with 144,360 training stimuli, 18,145 test stimuli, and 334 synthetic stimuli well-suited for building and evaluating robust models of human vision. We show preliminary results using MOSAIC to investigate how the internal representations between brain-optimized neural networks differ from task-optimized neural networks and perform a large-scale decoding analysis that highlights the importance of stimulus set diversity. This framework empowers the vision science community to collaboratively generate a scalable, generalizable foundation for studying human vision.Bio: Ben Lahner is a PhD candidate in computational neuroscience working with Dr. Aude Oliva. His research combines fMRI data with machine learning and deep learning techniques to better understand facets of the human visual system. His previous work has investigated visual memory, action understanding, and video decoding from brain activity patterns.
TBD
March 10
Add to Calendar
2025-03-10 16:00:00
2025-03-10 17:00:00
America/New_York
ML Tea: Unsupervised Discovery of Interpretable Structure in Complex Systems
Speaker: Mark HamiltonAbstract: How does the human mind make sense of raw information without being taught how to see or hear? In this talk we will explore how to build algorithms that can uncover interpretable structure from large collections of unsupervised data like images and video. First, I will describe how to classify every pixel of a collection of images without any human annotations (Unsupervised semantic segmentation) by distilling self-supervised vision models. Second, we’ll see how this basic idea leads us to a new unifying theory of representation learning, and I will show how 20 different common machine learning methods such as dimensionality reduction, clustering, contrastive learning, and spectral methods emerge from a single unified equation. Finally, we’ll use this unified theory to create algorithms that can decode natural language just by watching unlabeled videos of people talking, without any knowledge of text. This work is the first step in our broader effort to translate animals using large scale, unsupervised, and interpretable learners, and the talk will conclude with some of our most recent efforts to analyze the complex vocalizations of Atlantic spotted dolphins.Bio: Mark Hamilton is a PhD student in William T Freeman's lab at the MIT Computer Science & Artificial Intelligence Laboratory. He is also a Senior Engineering Manager at Microsoft where he leads a team building a large-scale distributed ML products for Microsoft’s largest databases. Mark is interested in how we can use unsupervised machine learning to discover scientific "structure" in complex systems. Mark values working on projects for social, cultural, and environmental good and aims to use his algorithms to help humans solve challenges they cannot solve alone.
TBD
March 03
Add to Calendar
2025-03-03 16:00:00
2025-03-03 17:00:00
America/New_York
ML Tea: Learning Generative Models from Corrupted Data
Speaker: Giannis DarasAbstract: In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.Bio: Giannis Daras is a postdoctoral researcher at MIT working closely with Prof. Costis Daskalakis and Prof. Antonio Torralba. Prior to MIT, Giannis completed his Ph.D. at UT Austin, under the supervision of Prof. Alexandros G. Dimakis. Giannis is interested in generative modelling and the applications of generative models to inverse problems. A key aspect of his work involves developing algorithms for learning generative models from noisy data. His research has broad implications across various fields, including scientific applications, privacy and copyright concerns, and advancing data-efficient learning techniques.
TBD
February 24
Add to Calendar
2025-02-24 16:00:00
2025-02-24 17:00:00
America/New_York
MLTea: Score-of-Mixture Training: One-Step Generative Model Training via Score Estimation of Mixture Distributions
Abstract: We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the α-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64×64 show that SMT/SMD are competitive with and can even outperform existing methods.Bio: Tejas is a final year PhD student in the Signals, Information and Algorithms Lab, advised by Professor Gregory Wornell. His research interests are centered around statistical inference, information theory and generative modeling with a recent focus on fundamental and applied aspects of score estimation and diffusion-based generative models. During his PhD, Tejas has interned at Meta AI, Google Research, Adobe Research and Mitsubishi Electric Research Labs. He is currently a recipient of the MIT Claude E. Shannon Fellowship.
TBD
February 19
Add to Calendar
2025-02-19 16:00:00
2025-02-19 17:00:00
America/New_York
MLTea Talk: Theoretical Perspectives on Data Quality and Selection
Abstract: Though the fact that data quality directly affects the quality of our prediction has always been understood, the large-scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.Bio: Abhishek Shetty is an incoming Catherine M. and James E. Allchin Early-Career Assistant Professor in the School of Computer Science at Georgia Tech and is currently FODSI Postdoctoral Fellow at MIT, hosted by Sasha Rakhlin, Ankur Moitra and Costis Daskalakis. He graduated from the department of EECS at UC Berkeley advised by Nika Haghtalab. His interests lie at the intersection of machine learning, theoretical computer science and statistics and is aimed at developing statistically and computationally efficient algorithms for inference. His research has been awarded with the Apple AI/ML fellowship and the American Statistical association SCGS best student paper.
TBD
December 02
Truthfulness of Calibration Measures
Mingda Qiao
MIT CSAIL
Add to Calendar
2024-12-02 16:00:00
2024-12-02 16:30:00
America/New_York
Truthfulness of Calibration Measures
Abstract: We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor. Bio: Mingda Qiao a FODSI postdoc hosted by Ronitt Rubinfeld at the MIT Theory of Computation (TOC) Group, and an incoming assistant professor at UMass Amherst (starting Fall'25). His research focuses on the theory of prediction, learning, and decision-making in sequential settings, as well as collaborative federated learning. Prior to MIT, Mingda was a FODSI postdoc at UC Berkeley, received his PhD in Computer Science from Stanford University, and received his BEng in Computer Science from Tsinghua University.
November 25
Power of inclusion: Enhancing polygenic prediction with admixed individuals
Yosuke Tanigawa
MIT CSAIL
Add to Calendar
2024-11-25 16:00:00
2024-11-25 17:00:00
America/New_York
Power of inclusion: Enhancing polygenic prediction with admixed individuals
Zoom Link: https://mit.zoom.us/j/94204370795?pwd=eFZwYXVuWmVsQzE1UTRZN2VtY0lkUT09 with passcode 387975Abstract: Predicting heritable traits and genetic liability of disease from individuals’ genomes has important implications for tailoring medical prevention and intervention strategies in precision medicine. Polygenic score (PGS), a statistical approach, has recently attracted substantial attention due to its potential relevance in clinical practice. Admixed individuals offer unique opportunities for addressing limited transferability in PGSs. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals in developing more equitable PGS models.Bio: Yosuke Tanigawa, PhD, is a research scientist at MIT’s Computer Science and Artificial Intelligence Lab. To incorporate interindividual differences in disease prevention and treatment, he develops computational and statistical methods, focusing on predictive modeling with high-dimensional human genetics data, multi-omic dissection of disease heterogeneity, and therapeutic target discovery. His recent works focus on inclusive training strategies for genetic prediction algorithms and dissecting the molecular, cellular, and genetic basis of phenotypic heterogeneity in Alzheimer’s disease. He received many awards, including the Charles J. Epstein Trainee Awards for Excellence in Human Genetics Research and MIT Technology Review’s Innovators Under 35 Japan.
November 18
Dependence Induced Representation Learning
Xiangxiang Xu
EECS/RLE, MIT
Add to Calendar
2024-11-18 16:00:00
2024-11-18 17:00:00
America/New_York
Dependence Induced Representation Learning
Abstract: Despite the vast progress in deep learning practice, theoretical understandings of learned feature representations remain limited. In this talk, we discuss three fundamental questions from a unified statistical perspective:(1) What representations carry useful information?(2) How are representations learned from distinct algorithms related?(3) Can we separate representation learning from solving specific tasks?We formalize representations that extract statistical dependence from data, termed dependence-induced representations. We prove that representations are dependence-induced if and only if they can be learned from specific features defined by Hirschfeld–Gebelein–Rényi (HGR) maximal correlation. This separation theorem signifies the key role of HGR features in representation learning and enables a modular design of learning algorithms. Specifically, we demonstrate the optimality of HGR features in simultaneously achieving different design objectives, including minimal sufficiency (Tishby's information bottleneck), information maximization, enforcing uncorrelated features (VICReg), and encoding information at various granularities (Matryoshka representation learning). We further illustrate that by adapting HGR features, we can obtain representations learned by distinct practices, from cross-entropy or hinge loss minimization, non-negative feature learning, and neural density ratio estimators to their regularized variants. We also discuss the applications of our analyses in interpreting learning phenomena such as neural collapse, understanding existing self-supervised learning practices, and obtaining more flexible designs, e.g., inference-time hyperparameter tuning.Bio: Xiangxiang Xu received the B.Eng. and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China, in 2014 and 2020, respectively. He is a postdoctoral associate in the Department of EECS at MIT. His research focuses on information theory, statistical learning, representation learning, and their applications in understanding and developing learning algorithms. He is a recipient of the 2016 IEEE PES Student Prize Paper Award in Honor of T. Burke Hayes and the 2024 ITA (Information Theory and Applications) Workshop Sand Award.
November 13
ContextCite: Attributing Model Generation to Context
Benjamin Cohen Wang
MIT CSAIL
Add to Calendar
2024-11-13 16:00:00
2024-11-13 17:00:00
America/New_York
ContextCite: Attributing Model Generation to Context
October 28
Generative Models for Biomolecular Prediction, Dynamics, and Design
Hannes Stärk and Bowen Jing
MIT CSAIL
Add to Calendar
2024-10-28 16:00:00
2024-10-28 17:00:00
America/New_York
Generative Models for Biomolecular Prediction, Dynamics, and Design
Abstract: We lay out the three avenues in which we think generative models are especially valuable for modeling biomolecules. 1) Hard prediction tasks can be better addressed with generative models that can suggest and rank multiple solutions (e.g. docking). 2) The dynamics and conformations of biomolecules can be captured with generative models (e.g. protein conformational ensembles and MD trajectories). 3) Designing new biomolecules can be accelerated, informed by samples or likelihoods from generative models (e.g. protein binder or regulatory DNA design).
32-G882 (Hewlett)
October 21
Add to Calendar
2024-10-21 16:00:00
2024-10-21 17:00:00
America/New_York
Objective Approaches in a Subjective Medical World
Abstract: In today’s healthcare system, patients often feel disconnected from clinical professionals and their care journey. They receive a “one-size-fits-all” plan and are left out of the decision-making process, which can lead to a less satisfying experience. My research focuses on applying advanced AI technologies, including large language models, machine learning, and IoT, to address challenges in healthcare, particularly in patient-centered healthcare delivery. I aim to enhance the accuracy and efficiency of healthcare systems by using these "objective approaches" to navigate the subjective aspects of medical practice, such as clinician notes and patient preferences found in electronic health records. A key aspect of my work is improving the transparency of AI-based healthcare applications, making them more understandable and trustworthy for both clinicians and patients, by addressing critical issues such as building trust in AI systems and ensuring these technologies effectively meet the needs of patients and healthcare providers. Additionally, I emphasize the importance of personalizing healthcare by considering each patient's unique circumstances, including their preferences and socio-economic conditions. This research applies AI across various areas, from specific diseases like cancer to broader healthcare contexts, with the goal of improving both the delivery and experience of healthcare. My work contributes to the development of AI tools that not only enhance clinical decision-making but also foster better human-AI interaction, ultimately leading to improved healthcare outcomes.
32-G882
October 16
Economic Representations
Suproteem Sarkar
Harvard University
Add to Calendar
2024-10-16 16:00:00
2024-10-16 17:00:00
America/New_York
Economic Representations
32-G882
October 07
Contextualizing Self-Supervised Learning: A New Path Ahead
Yifei Wang
CSAIL
Add to Calendar
2024-10-07 16:00:00
2024-10-07 16:30:00
America/New_York
Contextualizing Self-Supervised Learning: A New Path Ahead
Abstract: Self-supervised learning (SSL) has achieved remarkable progress over the years, particularly in visual domains. However, recent advancements have plateaued due to performance bottlenecks, and more focus has shifted towards generative models. In this talk, we step back to analyze existing SSL paradigms and identify the lack of context as their most critical obstacle. To address this, we explore two approaches that incorporate contextual knowledge into SSL: 1. Contextual Self-Supervised Learning: Here, learned representations adapt their inductive biases to diverse contexts, enhancing the flexibility and generality of SSL. 2. Self-Correction: This method allows foundation models to refine themselves by reflecting on their own predictions within a dynamically evolving context.These insights illustrate new paths to craft self-supervision and highlight context as a key ingredient for building general-purpose SSL.Paper Links: * In-Context Symmetries: Self-Supervised Learning through Contextual World Models (https://arxiv.org/pdf/2405.18193) * A Theoretical Understanding of Self-Correction through In-context Alignment (https://arxiv.org/pdf/2405.18634)Both papers to be covered in this talk were accepted to NeurIPS 2024. The theoretical work on understanding self-correction received the Spotlight Award at the ICML 2024 ICL Workshop.Bio: Yifei Wang is a postdoc at CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. Yifei is generally interested in machine learning and representation learning, with a focus on bridging the theory and practice of self-supervised learning. His first-author works have been recognized by multiple best paper awards, including the Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Spotlight Award at the ICML 2024 ICL Workshop.
32-G882 (Hewlett Room)
September 23
Learning to Decode Collaboratively with Multiple Language Models
Shannon Shen
MIT CSAIL
Add to Calendar
2024-09-23 16:00:00
2024-09-23 16:30:00
America/New_York
Learning to Decode Collaboratively with Multiple Language Models
32-G882, Hewlett Room
September 16
Multi-sensory perception from top to down
Anna Min
CSAIL
Add to Calendar
2024-09-16 16:00:00
2024-09-16 16:30:00
America/New_York
Multi-sensory perception from top to down
Abstract: Human sensory experiences, such as vision, hearing, touch, and smell, serve as natural interfaces for perceiving and reasoning about the world around us. Understanding 3D environments is crucial for applications like video processing, robotics, and augmented reality. This work explores how material properties and microgeometry can be learned through cross-modal associations between sight, sound, and touch. I will introduce a method that leverages in-the-wild online videos to study interactable audio generation via dense visual cues. Additionally, I will share recent advancements in multimodal scene understanding and discuss future directions for the field.Bio: Anna is a senior undergraduate in Tsinghua University. Her previous research lies in multi-modal perception, from the perspective of audio and vision. She is an intern in Jim Glass's group.
32-G882, Hewlett Room
May 02
Decomposing Predictions by Modeling Model Computation
Harshay Shah
MIT CSAIL
Add to Calendar
2024-05-02 16:00:00
2024-05-02 16:30:00
America/New_York
Decomposing Predictions by Modeling Model Computation
Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. Paper: https://arxiv.org/abs/2404.11534Blog post: https://gradientscience.org/modelcomponents/Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations.
Room 32-G449 (Patil/Kiva)