Towards Interpretable and Operationalized Fairness in Machine Learning

Speaker

Schrasing Tong

MIT CSAIL

Host

Lalana Kagal

MIT CSAIL

Thesis advisor: Lalana Kagal
Thesis committee: Peter Szolovits, Brian Hedden

Abstract
Machine learning systems are increasingly deployed in sensitive, real-world settings, yet persistent biases in model predictions continue to disadvantage marginalized groups. This thesis develops practical and interpretable methods for understanding and mitigating such biases in natural language generation and computer vision.

For large language models, we introduce a decoding-time approach that leverages small biased and anti-biased expert models to obtain a debiasing signal that is added to the LLM output. This approach combines computational efficiency - fine-tuning a small model versus re-training a large model and interpretability - one can examine the probability shift from debiasing.

In computer vision, we leverage concept bottleneck models (CBMs), which map images to human-understandable concepts, to improve transparency and help mask proxy features that correlate with sensitive attributes. To counter CBM information leakage and improve fairness-performance tradeoffs, we introduce three mitigation strategies: (1) reducing leakage with a top-k concept filter, (2) removing concepts that correlate strongly with gender, and (3) applying adversarial debiasing to further suppress sensitive information. Together, these contributions illustrate how interpretability and operationalization can make fairness interventions more trustworthy, scalable, and aligned with real deployment needs.

Add to Calendar 2025-12-11 15:30:00 2025-12-11 16:30:00 America/New_York Towards Interpretable and Operationalized Fairness in Machine Learning Thesis advisor: Lalana KagalThesis committee: Peter Szolovits, Brian HeddenAbstractMachine learning systems are increasingly deployed in sensitive, real-world settings, yet persistent biases in model predictions continue to disadvantage marginalized groups. This thesis develops practical and interpretable methods for understanding and mitigating such biases in natural language generation and computer vision. For large language models, we introduce a decoding-time approach that leverages small biased and anti-biased expert models to obtain a debiasing signal that is added to the LLM output. This approach combines computational efficiency - fine-tuning a small model versus re-training a large model and interpretability - one can examine the probability shift from debiasing. In computer vision, we leverage concept bottleneck models (CBMs), which map images to human-understandable concepts, to improve transparency and help mask proxy features that correlate with sensitive attributes. To counter CBM information leakage and improve fairness-performance tradeoffs, we introduce three mitigation strategies: (1) reducing leakage with a top-k concept filter, (2) removing concepts that correlate strongly with gender, and (3) applying adversarial debiasing to further suppress sensitive information. Together, these contributions illustrate how interpretability and operationalization can make fairness interventions more trustworthy, scalable, and aligned with real deployment needs. TBD

Organizer & Contact

Schrasing Tong

st9@mit.edu

Part of

Thesis Defense

Towards Interpretable and Operationalized Fairness in Machine Learning

Speaker

Host

December 11 2025

Location

Organizer & Contact

Part of

January 21

Navigating Generative Vector Fields: Principled Inference for High-Dimensional Inverse Problems

January 20

PhD Defense, Christian Arnold: Building Helpful Agents for Human-AI Collaboration

Towards Interpretable and Operationalized Fairness in Machine Learning

Speaker

Host

December 11 2025

Location

Organizer & Contact

Part of

Related Events

January 21

Navigating Generative Vector Fields: Principled Inference for High-Dimensional Inverse Problems

January 20

PhD Defense, Christian Arnold: Building Helpful Agents for Human-AI Collaboration