I’m a 2nd year PhD student on an NSF Fellowship in the Clinical Decision Making Group (MEDG). My focuses have been in the union of NLP in clinical notes and Social Biases in clinical care.

Research Areas

Impact Areas




Quantifying Racial Disparities in End-of-Life Care

When discussing racial disparities in medical treatments, critics often cite social factors as confounders which explain away any differences. Comparing the health of whites to that of non-whites we do see that environmental and social factors conspire to yield higher rates of disease and shorter life spans in non-white populations. But does that really show that medical treatment itself is free from bias? We examine end-of-life care in the ICU, stratified by ethnicity, and controlled for acuity using severity assessment scores. Our analysis agrees with previous studies that nonwhites tend to receive more aggressive (high-risk, high reward) treatments, such as mechanical ventilation than non-whites, despite receiving comparable-or-moderately-less noninvasive treatments. Going further, we show that using treatment patterns and clinical notes, we are able to infer a patient's race. Finally, we show evidence suggesting nonwhite have a much greater distrust of the medical community among than whites do. We find that race, even in the great equalizer of end-of-life care, does continue to influence the treatments administered to a patient.


CliNER: Clinical Concept Extraction

Clinical concept extraction (CCE) of named entities - such as problems, tests, and treatments - aids in forming an understanding of notes and provides a foundation for many downstream clinical decision-making tasks. Historically, this task has been posed as a standard named entity recognition (NER) sequence tagging problem, and solved with feature-based methods using hand-engineered domain knowledge. Recent advances, however, have demonstrated the efficacy of LSTM-based models for NER tasks, including CCE. This work presents CliNER 2.0, a simple-to-install, open-source tool for extracting concepts from clinical text. CliNER 2.0 uses a word- and character- level LSTM model, and achieves state-of-the-art performance. For ease of use, the tool also includes pre-trained models available for public use.


Information Retrieval for Cancer Treatments in Clinical Literature and Trial Eligibility

A "precision medicine" approach for finding relevant cancer treatments in clinical literature and eligible trials. For a given patient with associated demographics (age, gender) and disease (cancer type, genetic variants), we query a database of all pubmed articles and trials using NLP techniques to find the most useful and relevant treatments for the patient. Our ensemble-based system performed very well in the TREC 2016 Precision Medicine challenge.


Synthetically-Identified Clinical Notes

Clinical notes often describe the most important aspects of a patient's physiology and are therefore critical to medical research. However, these notes are typically inaccessible to researchers without prior removal of sensitive protected health information (PHI), a natural language processing (NLP) task referred to as de-identification. In order to build tools that perform deid, one typically needs the very same data that is private, thus creating a chicken-and-the-egg problem. In this work, we generate "fake" clinical notes where the deidentified information is replaced with real-seeming values (e.g. "Tim Lywood" instead of "George Beveridge") that still respect reasonable distributional semantics. We evaluate models trained on this synthetic data and show that they perform just as well as models trained on the sensitive PHI-bearing notes.

 2 More