PI
Core/Dual

Peter Szolovits

Professor

Publications

Projects

Project

Quantifying Racial Disparities in End-of-Life Care

When discussing racial disparities in medical treatments, critics often cite social factors as confounders which explain away any differences. Comparing the health of whites to that of non-whites we do see that environmental and social factors conspire to yield higher rates of disease and shorter life spans in non-white populations. But does that really show that medical treatment itself is free from bias? We examine end-of-life care in the ICU, stratified by ethnicity, and controlled for acuity using severity assessment scores. Our analysis agrees with previous studies that nonwhites tend to receive more aggressive (high-risk, high reward) treatments, such as mechanical ventilation than non-whites, despite receiving comparable-or-moderately-less noninvasive treatments. Going further, we show that using treatment patterns and clinical notes, we are able to infer a patient's race. Finally, we show evidence suggesting nonwhite have a much greater distrust of the medical community among than whites do. We find that race, even in the great equalizer of end-of-life care, does continue to influence the treatments administered to a patient.

Project

CliNER: Clinical Concept Extraction

Clinical concept extraction (CCE) of named entities - such as problems, tests, and treatments - aids in forming an understanding of notes and provides a foundation for many downstream clinical decision-making tasks. Historically, this task has been posed as a standard named entity recognition (NER) sequence tagging problem, and solved with feature-based methods using hand-engineered domain knowledge. Recent advances, however, have demonstrated the efficacy of LSTM-based models for NER tasks, including CCE. This work presents CliNER 2.0, a simple-to-install, open-source tool for extracting concepts from clinical text. CliNER 2.0 uses a word- and character- level LSTM model, and achieves state-of-the-art performance. For ease of use, the tool also includes pre-trained models available for public use.

Leads

Research Areas

Impact Areas

Project

Information Retrieval for Cancer Treatments in Clinical Literature and Trial Eligibility

A "precision medicine" approach for finding relevant cancer treatments in clinical literature and eligible trials. For a given patient with associated demographics (age, gender) and disease (cancer type, genetic variants), we query a database of all pubmed articles and clinicaltrials.gov trials using NLP techniques to find the most useful and relevant treatments for the patient. Our ensemble-based system performed very well in the TREC 2016 Precision Medicine challenge.

Leads

Research Areas

Impact Areas

Project

Synthetically-Identified Clinical Notes

Clinical notes often describe the most important aspects of a patient's physiology and are therefore critical to medical research. However, these notes are typically inaccessible to researchers without prior removal of sensitive protected health information (PHI), a natural language processing (NLP) task referred to as de-identification. In order to build tools that perform deid, one typically needs the very same data that is private, thus creating a chicken-and-the-egg problem. In this work, we generate "fake" clinical notes where the deidentified information is replaced with real-seeming values (e.g. "Tim Lywood" instead of "George Beveridge") that still respect reasonable distributional semantics. We evaluate models trained on this synthetic data and show that they perform just as well as models trained on the sensitive PHI-bearing notes.

Leads

Research Areas

Impact Areas

 7 More

Groups

Community of Research

Applied Machine Learning Community of Research

This CoR brings together researchers at CSAIL working across a broad swath of application domains. Within these lie novel and challenging machine learning problems serving science, social science and computer science.

Community of Research

Cognitive AI Community of Research

This CoR aims to develop AI technology that synthesizes symbolic reasoning, probabilistic reasoning for dealing with uncertainty in the world, and statistical methods for extracting and exploiting regularities in the world, into an integrated picture of intelligence that is informed by computational insights and by cognitive science.

News