Two-phase optimization for support vectors and parameter selection of support vector machines: Two-class classification

被引：15

作者：

Wu, Shinq-Jen ^{[1
]}

Van-Hung Pham ^{[1
]}

Thi-Nga Nguyen ^{[1
]}

机构：

[1] Da Yeh Univ, Dept Elect Engn, Changhua, Taiwan

来源：

APPLIED SOFT COMPUTING | 2017年 / 59卷

关键词：

Classification; Systems biology; Mimetic computation; Working set selection; PARTICLE SWARM OPTIMIZATION; ALGORITHM;

D O I：

10.1016/j.asoc.2017.05.021

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Support vector machines (SVMs) are one of the most popular classification tools and show the most potential to address under-sampled noisy data (a large number of features and a relatively small number of samples). However, the computational cost is too expensive, even for modern-scale samples, and the performance largely depends on the proper setting of parameters. As the data scale increases, the improvement in speed becomes increasingly challenging. As the dimension (feature number) largely increases while the sample size remains small, the avoidance of overfitting becomes a significant challenge. In this study, we propose a two-phase sequential minimal optimization (TSMO) to largely reduce the training cost for large-scale data (tested with 3186-70,000-sample datasets) and a two-phased-in differential-learning particle swarm optimization (tDPSO) to ensure the accuracy for under-sampled data (tested with 2000-24481-feature datasets). Because the purpose of training SVMs is to identify support vectors that denote a hyperplane, TSMO is developed to quickly select support vector candidates from the entire dataset and then identify support vectors from those candidates. In this manner, the computational burden is largely reduced (a 29.4%-65.3% reduction rate). The proposed tDPSO uses topology variation and differential learning to solve PSO's premature convergence issue. Population diversity is ensured through dynamic topology until a ring connection is achieved (topology-variation phases). Further, particles initiate chemo-type simulated-annealing operations, and the global-best particle takes a two-turn diversion in response to stagnation (event-induced phases). The proposed tDPSO-embedded SVMs were tested with several under-sampled noisy cancer datasets and showed superior performance over various methods, even those methods with feature selection for the preprocessing of data. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：129 / 142

页数：14

共 40 条

[1]

Al-Dulaimi H.B., 2011, INT J COMPUT, V7, P50

[2]

[Anonymous], 1999, Random forests-random features

[3] Working set selection using functional gain for LS-SVM [J].