Removing Biases from Molecular Representations via Information Maximization
Speaker
Chenyu Wang
EECS MIT
Host
Thien Le
CSAIL MIT
Abstract: High-throughput drug screening – using cell imaging or gene expression measurements as readouts of drug effect – is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE’s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.
Bio: I am a second-year PhD student at MIT EECS, advised by Tommi Jaakkola and Caroline Uhler. I am also affiliated with Eric and Wendy Schmidt Center (EWSC) at Broad Institute. My research interests lie broadly in machine learning, representation learning, and AI for science. Recently my research focuses on multi-modal representation learning and perturbation modelling for drug discovery. Before my PhD, I obtained my Bachelor’s degree from Tsinghua University.
Bio: I am a second-year PhD student at MIT EECS, advised by Tommi Jaakkola and Caroline Uhler. I am also affiliated with Eric and Wendy Schmidt Center (EWSC) at Broad Institute. My research interests lie broadly in machine learning, representation learning, and AI for science. Recently my research focuses on multi-modal representation learning and perturbation modelling for drug discovery. Before my PhD, I obtained my Bachelor’s degree from Tsinghua University.