Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Breast Cancer Survivors

被引:25
作者
Chang, Chi-Chang [1 ,2 ]
Chen, Ssu-Han [3 ]
机构
[1] Chung Shan Med Univ, Sch Med Informat, Taichung, Taiwan
[2] Chung Shan Med Univ Hosp, IT Off, Taichung, Taiwan
[3] Ming Chi Univ Technol, Dept Ind Engn & Management, New Taipei, Taiwan
关键词
second primary cancers (SPCs); breast cancer; machine learning; classification; machine learning-based classification scheme; RISK-FACTORS; REDUCTION; ALGORITHM;
D O I
10.3389/fgene.2019.00848
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Due to the high effectiveness of cancer screening and therapies, the diagnosis of second primary cancers (SPCs) has increased in women with breast cancer. The present study was conducted to develop a novel machine learning-based classification scheme for predicting the risk factors of SPCs in breast cancer survivors. The proposed scheme was based on the XGBoost classifier with the following four comparable strategies: transformation, resampling, clustering, and ensemble learning, to improve the training balanced accuracy. Results suggested that the best prediction accuracy for an empirical case is the XGBoost associated with the strategies of resampling and clustering. The experimental results showed that age, sequence of radiotherapy and surgery, surgical margins of the primary site, human epidermal growth factor, high-dose clinical target volume, and estrogen receptors are relatively more important risk factors associated with SPCs in patients with breast cancer. These risk factors should be monitored for the early detection of breast cancer. In conclusion, the proposed scheme can support the important influence of personality and clinical symptom representations in all phases of the primary treatment trajectory. Our results further suggested that adaptive machine learning techniques require the incorporation of significant variables for optimal predictions.
引用
收藏
页数:6
相关论文
共 35 条
[1]   Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review [J].
Abreu, Pedro Henriques ;
Santos, Miriam Seoane ;
Abreu, Miguel Henriques ;
Andrade, Bruno ;
Silva, Daniel Castro .
ACM COMPUTING SURVEYS, 2016, 49 (03)
[2]  
Alapati Y. K., 2016, LUNG CANCER, V32, P3
[3]  
[Anonymous], JOURNAL OF COMPUTING, DOI DOI 10.20532/CIT.2016.1002701
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]   LONG-TERM HEALTH RISK AFTER BREAST-CANCER RADIOTHERAPY: OVERVIEW OF PASSOS METHODOLOGY AND SOFTWARE [J].
Eidemueller, Markus ;
Simonetto, Cristoforo ;
Kundrat, Pavel ;
Ulanowski, Alexander ;
Shemiakina, Elena ;
Guethlin, Denise ;
Rennau, Hannes ;
Remmele, Julia ;
Hildebrandt, Guido ;
Wolf, Ulrich .
RADIATION PROTECTION DOSIMETRY, 2019, 183 (1-2) :259-263
[6]   Framework of Computer Aided Diagnosis Systems for Cancer Classification Based on Medical Images [J].
El Houby, Enas M. F. .
JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (08)
[7]   Sociodemographic and economic factors are associated with weight gain between before and after cancer diagnosis: results from the prospective population-based NutriNet-Sante cohort [J].
Fassier, Philippine ;
Zelek, Laurent ;
Bachmann, Patrick ;
Touillaud, Marina ;
Druesne-Pecollo, Nathalie ;
Partula, Valentin ;
Hercberg, Serge ;
Galan, Pilar ;
Cohen, Patrice ;
Hoarau, Helene ;
Latino-Martel, Paule ;
Srour, Bernard ;
Gonzalez, Rebeca ;
Deschasaux, Melanie ;
Touvier, Mathilde .
ONCOTARGET, 2017, 8 (33) :54640-54653
[8]   SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary [J].
Fernandez, Alberto ;
Garcia, Salvador ;
Herrera, Francisco ;
Chawla, Nitesh V. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 :863-905
[9]   Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2+2 Regularization [J].
Huang, Hai-Hui ;
Liu, Xiao-Ying ;
Liang, Yong .
PLOS ONE, 2016, 11 (05)
[10]   Extensions to the k-means algorithm for clustering large data sets with categorical values [J].
Huang, ZX .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) :283-304