Project

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

Adding domain knowledge to word embeddings.

In recent years, word vectors have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of words. Clinical natural language processing datasets, however, tend to be much smaller. Even the largest publicly-available dataset of medical notes is three orders of magnitude smaller than the dataset of the oft-used "Google News" word vectors. In order to make up for limited training data sizes, we encode expert domain knowledge into our embeddings. Building on a previous extension of word2vec, we show that generalizing the notion of a word's "context'' to include arbitrary features creates an avenue for encoding domain knowledge into word embeddings. We show that this method produces word vectors that are strictly better than their text-only counterparts evaluating against the judgment of clinical experts.

Group

Clinical Decision-Making Group

Communities

Cognitive AI Community of Research Applied Machine Learning Community of Research

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Apr 24 '20

Research Areas

AI & ML

Impact Areas

Health Care

Project

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

Group

Communities

Contact us

Research Areas

Impact Areas

Group

Communities

Members

Peter Szolovits