Efficient and decision boundary aware instance selection for support vector machines

被引:30
作者
Aslani, Mohammad [1 ]
Seipel, Stefan [1 ,2 ]
机构
[1] Univ Gavle, Dept Comp & Geospatial Sci, Gavle, Sweden
[2] Uppsala Univ, Dept Informat Technol, Div Visual Informat & Interact, Uppsala, Sweden
关键词
Instance selection; Data reduction; Big data; Support vector machines; Machine learning; LARGE DATA SETS; CLASSIFICATION; EXTRACTION; FUSION; TREE;
D O I
10.1016/j.ins.2021.07.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support vector machines (SVMs) are powerful classifiers that have high computational complexity in the training phase, which can limit their applicability to large datasets. An effective approach to address this limitation is to select a small subset of the most representative training samples such that desirable results can be obtained. In this study, a novel instance selection method called border point extraction based on locality-sensitive hashing (BPLSH) is designed. BPLSH preserves instances that are near the decision boundaries and eliminates nonessential ones. The performance of BPLSH is benchmarked against four approaches on different classification problems. The experimental results indicate that BPLSH outperforms the other methods in terms of classification accuracy, preservation rate, and execution time. The source code of BPLSH can be found in https://github.com/mohaslani/BPLSH. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:579 / 598
页数:20
相关论文
共 49 条
[1]  
Abe S, 2001, LECT NOTES COMPUT SC, V2130, P308
[2]   COMPETITIVE LEARNING ALGORITHMS FOR VECTOR QUANTIZATION [J].
AHALT, SC ;
KRISHNAMURTHY, AK ;
CHEN, PK ;
MELTON, DE .
NEURAL NETWORKS, 1990, 3 (03) :277-290
[3]  
[Anonymous], 2004, Neural Inf. Process.-Lett. Rev.
[4]   Instance selection of linear complexity for big data [J].
Arnaiz-Gonzalez, Alvar ;
Diez-Pastor, Jose-Francisco ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar .
KNOWLEDGE-BASED SYSTEMS, 2016, 107 :83-95
[5]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[6]   A new fast prototype selection method based on clustering [J].
Arturo Olvera-Lopez, J. ;
Ariel Carrasco-Ochoa, J. ;
Francisco Martinez-Trinidad, J. .
PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (02) :131-141
[7]   A fast instance selection method for support vector machines in building extraction [J].
Aslani, Mohammad ;
Seipel, Stefan .
APPLIED SOFT COMPUTING, 2020, 97
[8]   CBCH (clustering-based convex hull) for reducing training time of support vector machine [J].
Birzhandi, Pardis ;
Youn, Hee Yong .
JOURNAL OF SUPERCOMPUTING, 2019, 75 (08) :5261-5279
[9]   Support vector machine classification for large data sets via minimum enclosing ball clustering [J].
Cervantes, Jair ;
Li, Xiaoou ;
Yu, Wen ;
Li, Kang .
NEUROCOMPUTING, 2008, 71 (4-6) :611-619
[10]  
Cervantes J, 2006, LECT NOTES ARTIF INT, V4293, P572