Fusing heterogeneous data to improve breast cancer diagnosis

Speaker: Jonathan Jesneck , Duke University
Date: February 9 2007
Time: 2:00PM to 3:00PM
Location: 32-D507
Host: Polina Golland, CSAIL
Contact: Polina Golland, x38005, polina@mit.edu
This talk will focus on two breast cancer projects: fusing
heterogeneous information and discovering blood protein biomarkers.
1) Optimized decision fusion of heterogeneous data for breast cancer
diagnosis.
As more diagnostic testing options become available to physicians, it
becomes more difficult to combine various types of medical information
together in order to optimize the overall diagnosis. To improve
diagnostic performance, here we introduce an approach to optimize a
decision-fusion technique to combine heterogeneous information, such
as from different modalities, feature categories, or institutions. For
classifier comparison we used two performance metrics: The receiving
operator characteristic ROC area under the curve area under the ROC
curve AUCand the normalized partial area under the curve pAUC . This
study used four classifiers: Linear discriminant analysis LDA ,
artificial neural network ANN , and two variants of our
decision-fusion technique, AUC-optimized DF-A and pAUC-optimized DF-P
decision fusion. We applied each of these classifiers with 100-fold
cross-validation to two heterogeneous breast cancer data sets: One of
mass lesion features and a much more challenging one of
microcalcification lesion features. For the calcification data set,
decision fusion outperformed the other classifiers in terms of both
AUC (p < 0.02) and pAUC (p < 0.01). For the mass data set, DF-A
outperformed both the ANN and the LDA. Although for this data set
there were no statistically significant differences among the
classifiers' pAUC values , the DF-P did significantly improve
specificity versus the LDA at both 98% and 100% sensitivity (p <
0.04). In conclusion, decision fusion directly optimized clinically
significant performance measures, such as AUC and pAUC, and sometimes
outperformed two well-known machine-learning techniques when applied
to two different breast cancer data sets.
2) Bayesian methods for discovery of circulating markers for breast
cancer diagnosis.
Although mammography is currently the preferred screening method for
breast cancer, it suffers from only moderate sensitivity
(approximately 70%) and high false positive rates. Only 13-29% of
suspicious masses are determined to be malignant. To improve the
diagnosis rate for breast cancer screening, we can provide more
information by measuring breast blood protein levels. This study
enrolled 122 women undergoing diagnostic biopsy at Duke University
Medical Center and the University of Pittsburgh between 2000 and 2005.
Blood sera were assayed for 98 different biomarkers with the Luminex
ELISA platform and reagents. To select biomarkers indicative of
malignancy, we applied an iterative Bayesian modeling averaging
technique. The final set of selected models included the features
MIF, patient age, Haptoglobin, Apolipoprotein Apo E, MMP 9, EGFR, and
ACTH. The classifier achieved an area under the receiver operating
characteristic (ROC) curve of 0.82. At 80% sensitivity, this
technique could obviate 40% of unnecessary biopsies.
See other events that are part of Biomedical Imaging and Analysis 2006/2007
See other events happening in February 2007