Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies

被引:23
作者
Mihaylov, Iliyan [1 ]
Nisheva, Maria [1 ,2 ]
Vassilev, Dimitar [1 ]
机构
[1] Sofia Univ St Kliment Ohridski, Fac Math & Informat, 5 James Bourchier Blvd, Sofia 1164, Bulgaria
[2] Bulgarian Acad Sci, Inst Math & Informat, Acad G Bonchev Str,Block 8, BU-1113 Sofia, Bulgaria
关键词
bioinformatics; machine learning; breast cancer; survival time prognosis; cross-validation; NEURAL-NETWORK;
D O I
10.3390/info10030093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression-these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper.
引用
收藏
页数:13
相关论文
共 27 条
[1]   Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review [J].
Abreu, Pedro Henriques ;
Santos, Miriam Seoane ;
Abreu, Miguel Henriques ;
Andrade, Bruno ;
Silva, Daniel Castro .
ACM COMPUTING SURVEYS, 2016, 49 (03)
[2]   Support vector machines combined with feature selection for breast cancer diagnosis [J].
Akay, Mehmet Fatih .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :3240-3247
[3]  
Aloraini A., 2012, INT J ARTIF INTELL E, V3, P21, DOI [10.5121/ijaia.2012.3603, DOI 10.5121/IJAIA.2012.3603]
[4]  
[Anonymous], 2004, Analyzing Microarray Gene Expression Data
[5]   Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer [J].
Chaudharyl, Kumardeep ;
Poirionl, Olivier B. ;
Lu, Liangqun ;
Garmire, Lana X. .
CLINICAL CANCER RESEARCH, 2018, 24 (06) :1248-1259
[6]  
Cruz JA, 2006, CANCER INFORM, V2, P59
[7]   Predicting breast cancer survivability: a comparison of three data mining methods [J].
Delen, D ;
Walker, G ;
Kadam, A .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 34 (02) :113-127
[8]   Breast cancer statistics, 2017, racial disparity in mortality by state [J].
DeSantis, Carol E. ;
Ma, Jiemin ;
Sauer, Ann Goding ;
Newman, Lisa A. ;
Jemal, Ahmedin .
CA-A CANCER JOURNAL FOR CLINICIANS, 2017, 67 (06) :439-448
[9]  
Djebbari Amira, 2008, International Journal of Computational Biology and Drug Design, V1, P275, DOI 10.1504/IJCBDD.2008.021422
[10]  
Futschik ME., 2003, Applied Bioinformatics, V2, P53