Towards Scalable Structured Data from Clinical Text

Speaker

Monica Agrawal
MIT CSAIL

Host

David Sontag
MIT CSAIL
Abstract: The data in electronic health records have immense potential to transform medicine both at the point-of-care and through retrospective research. However, structured data alone can only tell a fraction of patients' clinical narratives, as many clinically important variables are trapped within clinical notes. Automated extraction is difficult since clinical notes are written in their own jargon-heavy dialect, patient histories can contain hundreds of notes, and there is often minimal labeled data available. In this talk, I will discuss multiple natural language processing (NLP) solutions to improve the scalability of structuring data from clinical text. These include the design of human-AI teams for efficient clinical annotation, the development of label-efficient modeling methodology, and techniques for leveraging large language models. I will also describe a new paradigm for EHRs that incentivizes the creation of high-quality data at the point-of-care. I will end by discussing future opportunities to leverage modern NLP to impact a variety of healthcare workflows.

Thesis Committee: David Sontag (MIT), Peter Szolovits (MIT), Yoon Kim (MIT), Noemie Elhadad (Columbia)