Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM

被引:7
作者
Liu, Xin [1 ]
Zhao, Bangxin [2 ]
He, Wenqing [2 ]
机构
[1] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[2] Univ Western Ontario, Dept Stat & Actuarial Sci, London, ON N6A 5B7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
classification; data-adaptive kernel; feature selection; penalty; predictive model; simultaneous classification; support vector machine; SUPPORT VECTOR MACHINES; VARIABLE SELECTION;
D O I
10.3390/math8101846
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Simultaneous feature selection and classification have been explored in the literature to extend the support vector machine (SVM) techniques by adding penalty terms to the loss function directly. However, it is the kernel function that controls the performance of the SVM, and an imbalance in the data will deteriorate the performance of an SVM. In this paper, we examine a new method of simultaneous feature selection and binary classification. Instead of incorporating the standard loss function of the SVM, a penalty is added to the data-adaptive kernel function directly to control the performance of the SVM, by firstly conformally transforming the kernel functions of the SVM, and then re-conducting an SVM classifier based on the sparse features selected. Both convex and non-convex penalties, such as least absolute shrinkage and selection (LASSO), moothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP) are explored, and the oracle property of the estimator is established accordingly. An iterative optimization procedure is applied as there is no analytic form of the estimated coefficients available. Numerical comparisons show that the proposed method outperforms the competitors considered when data are imbalanced, and it performs similarly to the competitors when data are balanced. The method can be easily applied in medical images from different platforms.
引用
收藏
页码:1 / 22
页数:22
相关论文
共 35 条
[1]   Improving support vector machine classifiers by modifying kernel functions [J].
Amari, S ;
Wu, S .
NEURAL NETWORKS, 1999, 12 (06) :783-789
[2]  
[Anonymous], 2001, PATTERN CLASSIFICATI
[3]  
Blake C. L., 1998, UCI REPOSITORY MACHI, V55
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[6]  
Bradley P. S., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P82
[7]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[8]  
Claeskens G, 2008, J MACH LEARN RES, V9, P541
[9]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[10]  
Fumera G, 2002, LECT NOTES COMPUT SC, V2388, P68