A novel machine learning strategy for model selections-Stepwise Support Vector Machine (StepSVM)

被引:22
作者
Guo, Chao-Yu [1 ]
Chou, Yu-Chin [1 ]
机构
[1] Natl Yang Ming Univ, Sch Med, Inst Publ Hlth, Taipei, Taiwan
关键词
D O I
10.1371/journal.pone.0238384
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
An essential aspect of medical research is the prediction for a health outcome and the scientific identification of important factors. As a result, numerous methods were developed for model selections in recent years. In the era of big data, machine learning has been broadly adopted for data analysis. In particular, the Support Vector Machine (SVM) has an excellent performance in classifications and predictions with the high-dimensional data. In this research, a novel model selection strategy is carried out, named as the Stepwise Support Vector Machine (StepSVM). The new strategy is based on the SVM to conduct a modified stepwise selection, where the tuning parameter could be determined by 10-fold cross-validation that minimizes the mean squared error. Two popular methods, the conventional stepwise logistic regression model and the SVM Recursive Feature Elimination (SVM-RFE), were compared to the StepSVM. The Stability and accuracy of the three strategies were evaluated by simulation studies with a complex hierarchical structure. Up to five variables were selected to predict the dichotomous cancer remission of a lung cancer patient. Regarding the stepwise logistic regression, the mean of the C-statistic was 69.19%. The overall accuracy of the SVM-RFE was estimated at 70.62%. In contrast, the StepSVM provided the highest prediction accuracy of 80.57%. Although the StepSVM is more time consuming, it is more consistent and outperforms the other two methods.
引用
收藏
页数:18
相关论文
共 20 条
[1]  
[Anonymous], 2000, Applied Logistic Regression
[2]  
[Anonymous], 1997, Machine Learning
[3]  
Ben-Hur A, 2010, METHODS MOL BIOL, V609, P223, DOI 10.1007/978-1-60327-241-4_13
[4]  
Boser B.E., 1992, 5 ANN WORKSH COMP LE, P144
[5]   C-statistic: A brief explanation of its construction, interpretation and limitations [J].
Caetano, S. J. ;
Sonpavde, G. ;
Pond, G. R. .
EUROPEAN JOURNAL OF CANCER, 2018, 90 :130-132
[6]   Support vector machines for diagnosis of breast tumors on US images [J].
Chang, RF ;
Wu, WJ ;
Moon, WK ;
Chou, YH ;
Chen, DR .
ACADEMIC RADIOLOGY, 2003, 10 (02) :189-197
[7]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[8]  
Dreiseitl S, 2001, J BIOMED INFORM, V34, P28, DOI 10.1006/jbin.2001.10004
[9]  
Expert Panel on Gastrointestinal Imaging, 2019, J AM COLL RADIOL, V16, pS141, DOI 10.1016/j.jacr.2019.02.015
[10]   A statistical predictive model consistent within a 5-year follow-up period for patients with acute heart failure [J].
Guo, Chao-Yu ;
Chan, Chien-Hui ;
Chou, Yu-Chin ;
Sung, Shih-Hsien ;
Cheng, Hao-Min .
JOURNAL OF THE CHINESE MEDICAL ASSOCIATION, 2020, 83 (11) :1008-1013