Model Comparison for Breast Cancer Prognosis Based on Clinical Data

被引:35
作者
Boughorbel, Sabri [1 ]
Al-Ali, Rashid [1 ]
Elkum, Naser [2 ]
机构
[1] Sidra Med & Res Ctr, Biomed Informat Div, Doha, Qatar
[2] Sidra Med & Res Ctr, Div Clin Epidemiol, Doha, Qatar
关键词
STATISTICS;
D O I
10.1371/journal.pone.0146413
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.
引用
收藏
页数:15
相关论文
共 25 条
[1]  
Ahmed A, 2012, BREAST CANC METASTAS, P1
[2]  
Alteri R., 2013, Breast Cancer Facts & Figures 2013-2014
[3]  
[Anonymous], 1990, Neurocomputing: Algorithms, architectures and applications
[4]  
Bishop CM, 1995, Neural Networks for Pattern Recognition
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
Burke H. B., 1995, Advances in Neural Information Processing Systems 7, P1063
[8]   The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups [J].
Curtis, Christina ;
Shah, Sohrab P. ;
Chin, Suet-Feung ;
Turashvili, Gulisa ;
Rueda, Oscar M. ;
Dunning, Mark J. ;
Speed, Doug ;
Lynch, Andy G. ;
Samarajiwa, Shamith ;
Yuan, Yinyin ;
Graef, Stefan ;
Ha, Gavin ;
Haffari, Gholamreza ;
Bashashati, Ali ;
Russell, Roslin ;
McKinney, Steven ;
Langerod, Anita ;
Green, Andrew ;
Provenzano, Elena ;
Wishart, Gordon ;
Pinder, Sarah ;
Watson, Peter ;
Markowetz, Florian ;
Murphy, Leigh ;
Ellis, Ian ;
Purushotham, Arnie ;
Borresen-Dale, Anne-Lise ;
Brenton, James D. ;
Tavare, Simon ;
Caldas, Carlos ;
Aparicio, Samuel .
NATURE, 2012, 486 (7403) :346-352
[9]   Predicting breast cancer survivability: a comparison of three data mining methods [J].
Delen, D ;
Walker, G ;
Kadam, A .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 34 (02) :113-127
[10]   Breast cancer statistics, 2011 [J].
DeSantis, Carol ;
Siegel, Rebecca ;
Bandi, Priti ;
Jemal, Ahmedin .
CA-A CANCER JOURNAL FOR CLINICIANS, 2011, 61 (06) :409-418