Thesis Defense: Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation

Speaker

Aparna Balagopalan

Host

Marzyeh Ghassemi

Title: Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation

Speaker: Aparna Balagopalan

Abstract: Machine learning holds great promise in positively impacting users' lives. However, failure modes such as encoded biases can reduce its reliability in the real-world. As emphasized by prior work, reliability concerns can occur throughout the pipeline: from data collection to when model predictions are presented to end-users. To implement reliable ML systems in practice, it is essential to address data issues such as non-random missingness, modeling concerns such as unfair optimization objectives, and model presentation strategies that incorrectly impact user trust. Thus, in this thesis, we propose strategies to operationalize reliability in practice by proposing strategies to measure, intervene on, and improve reliability throughout the ML pipeline. We focus on three components that reliability critically depends on. First, to improve responsible data collection we perform two case studies. We investigate how label collection practices that do not match normative deployment contexts introduce a measurement error, impacting downstream ML performance. We also explore the issues of time-based missingness of demographic data in a large retrospective health dataset and its implications on fairness-related conclusions. Second, we propose two modeling-related strategies to improve reliability: mitigating the impact of label noise via an approach to filter out noisy data, and a method for creating fair rankings. Finally, recognizing that models are only effective if trusted by end-users, we analyze the fairness of explanation strategies. In summary, this thesis takes steps towards operationalizing reliability in real-world ML systems, emphasizing the human processes and design choices that shape each stage.

Date: Monday, 9 June 2025
Time: 1pm
Room: Room 32-D677 in the Stata Center
Zoom: https://mit.zoom.us/j/94956291352

Committee: Marzyeh Ghassemi, Arvind Satyanarayan, Gillian Hadfield

Add to Calendar 2025-06-09 13:00:00 2025-06-09 14:00:00 America/New_York Thesis Defense: Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation Title: Operationalizing Reliable Machine Learning: From Data Collection to Model PresentationSpeaker: Aparna BalagopalanAbstract: Machine learning holds great promise in positively impacting users' lives. However, failure modes such as encoded biases can reduce its reliability in the real-world. As emphasized by prior work, reliability concerns can occur throughout the pipeline: from data collection to when model predictions are presented to end-users. To implement reliable ML systems in practice, it is essential to address data issues such as non-random missingness, modeling concerns such as unfair optimization objectives, and model presentation strategies that incorrectly impact user trust. Thus, in this thesis, we propose strategies to operationalize reliability in practice by proposing strategies to measure, intervene on, and improve reliability throughout the ML pipeline. We focus on three components that reliability critically depends on. First, to improve responsible data collection we perform two case studies. We investigate how label collection practices that do not match normative deployment contexts introduce a measurement error, impacting downstream ML performance. We also explore the issues of time-based missingness of demographic data in a large retrospective health dataset and its implications on fairness-related conclusions. Second, we propose two modeling-related strategies to improve reliability: mitigating the impact of label noise via an approach to filter out noisy data, and a method for creating fair rankings. Finally, recognizing that models are only effective if trusted by end-users, we analyze the fairness of explanation strategies. In summary, this thesis takes steps towards operationalizing reliability in real-world ML systems, emphasizing the human processes and design choices that shape each stage.Date: Monday, 9 June 2025Time: 1pmRoom: Room 32-D677 in the Stata CenterZoom: https://mit.zoom.us/j/94956291352Committee: Marzyeh Ghassemi, Arvind Satyanarayan, Gillian Hadfield TBD

Thesis Defense: Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation

Speaker

Host

June 09 2025

Location

July 09

Perfusion Imaging via Mass Transport

June 06

Thesis Defense: Data-Driven Methods for Health Equity

Thesis Defense: Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation

Speaker

Host

June 09 2025

Location

Related Events

July 09

Perfusion Imaging via Mass Transport

June 06

Thesis Defense: Data-Driven Methods for Health Equity