Machine learning model predicts C. difficile infection risk

C.Diff1

Every year nearly 30,000 Americans die from an aggressive, gut-infecting bacteria called Clostridium difficile (C. difficile), which is resistant to many common antibiotics and can flourish when antibiotic treatment kills off beneficial bacteria that normally keep it at bay. Investigators from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts General Hospital (MGH), and the University of Michigan (U-M), now have developed investigational “machine learning” models, specifically tailored to individual institutions, that can predict a patient’s risk of developing C. difficile much earlier than it would be diagnosed with current methods. Preliminary data from their study, which is being published today in Infection Control and Hospital Epidemiology, were presented last October at the ID Week 2017 conference.

“Despite substantial efforts to prevent C. difficile infection and to institute early treatment upon diagnosis, rates of infection continue to increase,” says Erica Shenoy, MD, PhD, of the MGH Division of Infectious Diseases, co-senior author of the study and assistant professor of Medicine at Harvard Medical School. “We need better tools to identify the highest risk patients so that we can target both prevention and treatment interventions to reduce further transmission and improve patient outcomes.”

The authors note that most previous models of C. difficile infection risk were designed as “one size fits all” approaches and included only a few risk factors, which limited their usefulness. Co-lead authors Maggie Makar, MS, of CSAIL, and Jeeheh Oh, a U-M graduate student in Computer Science and Engineering, and their colleagues took a “big data” approach that analyzed the whole electronic health record (EHR) to predict a patient’s C. difficile risk throughout the course of hospitalization. Their method allows the development of institution-specific models that could accommodate different patient populations, different EHR systems and factors specific to each institution.

“When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR can lead to differences in the underlying data distributions and ultimately to poor performance of such a model,” says Jenna Wiens, PhD, assistant professor of Computer Science and Engineering at U-M and co-senior author of the study. “To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution.”

Using their machine-learning-based model, the investigators analyzed de-identified data – including individual patient demographics and medical history, details of their admission and daily hospitalization, and the likelihood of exposure to C. difficile – from the EHRs of almost 257,000 patients admitted to either MGH or to Michigan Medicine – U-M’s academic medical center – over periods of two years and six years, respectively. The model generated daily risk scores for each individual patient that, when a set threshold is exceeded, classify patients as at high risk.

Overall, the models were highly successful at predicting which patients would ultimately be diagnosed with C. difficile. In half of those who were infected, accurate predictions could have been made at least five days before diagnostic samples were collected, which would allow highest-risk patients to be the focus of targeted antimicrobial interventions. If validated in prospective studies, the risk prediction score could guide early screening for C. difficile. For patients diagnosed earlier in the course of disease, initiation of treatment could limit the severity of the illness, and patients with confirmed C. difficile could be isolated and contact precautions instituted to prevent transmission to other patients.

The research team has made the algorithm code freely available here for others to review and adapt for their individual institutions. Shenoy notes that facilities that explore applying similar algorithms to their own institutions will need to assemble the appropriate local subject-matter experts and validate the performance of the models in their institutions.

Study co-author Vincent Young, MD, PhD, the William Henry Fitzbutler Professor in the Department of Internal Medicine at U-M, adds, “This represents a potentially significant advance in our ability to identify and ultimately act to prevent infection with C. difficile. The ability to identify patients at greatest risk could allow us to focus expensive and potentially limited prevention methods on those who would gain the greatest potential benefit. I think that this project is a great example of a ‘team science’ approach to addressing complex biomedical questions to improve healthcare, which I expect to see more of as we enter the era of precision health.”

Additional co-authors of the Infection Control and Hospital Epidemiology paper are Erin E. Ryan, MPH, CCRP, Lauren West, MPH, and David Hooper, MD, MGH Division of Infectious Diseases; Krishna Rao, MD, MS, Laraine Washer, MD, and Vincent Young, MD, PhD, University of Michigan Medical School; John Guttag, PhD, MIT Department of Electrical Engineering and Computer Science; and Christopher Fusco and Robert McCaffrey, Partners HealthCare Information Systems. The study was supported by the MGH-MIT Grand Challenge, National Science Foundation award IIS-1553146, National Institute of Allergy and Infectious Diseases grants U01 AI124255 and K01 AI110524, and a Morton N. Swartz Transformative Scholar Award.