Weighted K-means support vector machine for cancer prediction

被引:21
作者
Kim, SungHwan [1 ]
机构
[1] Korea Univ, Dept Stat, Seoul 136701, South Korea
来源
SPRINGERPLUS | 2016年 / 5卷
关键词
Support vector machine; K-means clustering; Weighted SVM; TCGA; BREAST-CANCER; RECURRENCE; TAMOXIFEN; RISK;
D O I
10.1186/s40064-016-2677-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To date, the support vector machine (SVM) has been widely applied to diverse biomedical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).
引用
收藏
页数:11
相关论文
共 24 条
[1]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[2]   Weighted Support Vector Machine Using k-Means Clustering [J].
Bang, Sungwan ;
Jhun, Myoungshic .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2014, 43 (10) :2307-2324
[3]  
Breiman F, 1984, OLSHEN STONE CLASSIF
[4]  
Breiman L, 1998, ANN STAT, V26, P801
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[7]   Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: A support vector machine learning approach [J].
Gould, Ian C. ;
Shepherd, Alana M. ;
Laurens, Kristin R. ;
Cairns, Murray J. ;
Carr, Vaughan J. ;
Green, Melissa J. .
NEUROIMAGE-CLINICAL, 2014, 6 :229-236
[8]  
Gu Q, 2013, P 16 INT C ART INT S, V31
[9]  
Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601
[10]   Comparison between error correcting output codes and fuzzy support vector machines [J].
Kikuchi, T ;
Abe, S .
PATTERN RECOGNITION LETTERS, 2005, 26 (12) :1937-1945