Data selection based on decision tree for SVM classification on large data sets

被引:59
作者
Cervantes, Jair [1 ]
Garcia Lamont, Farid [1 ]
Lopez-Chau, Asdrubal [2 ]
Rodriguez Mazahua, Lisbeth [3 ]
Sergio Ruiz, J. [1 ]
机构
[1] CU UAEM Texcoco, Fracc El Tejocote, Texcoco, Mexico
[2] CU UAEM Zumpango, Zumpango 55600, Estado de Mexic, Mexico
[3] Inst Tecnol Orizaba, Div Res & Postgrad Studies, Orizaba 9432, Veracruz, Mexico
关键词
SVM; Classification; Large data sets; SUPPORT VECTOR MACHINES; ALGORITHM; PROPERTY;
D O I
10.1016/j.asoc.2015.08.048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:787 / 798
页数:12
相关论文
共 70 条
[1]  
Abe S, 2001, LECT NOTES COMPUT SC, V2130, P308
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]  
Almas A., 2012, 2012 Seventh International Conference on Digital Information Management (ICDIM 2012), P7, DOI 10.1109/ICDIM.2012.6360115
[4]   Support vector machine approach for longitudinal dispersion coefficients in natural streams [J].
Azamathulla, H. Md. ;
Wu, Fu-Chun .
APPLIED SOFT COMPUTING, 2011, 11 (02) :2902-2905
[5]   Provably fast training algorithms for support vector machines [J].
Balcazar, Jose L. ;
Dai, Yang ;
Tanaka, Junichi ;
Watanabe, Osamu .
THEORY OF COMPUTING SYSTEMS, 2008, 42 (04) :568-595
[6]  
Bennett KP, 1997, GEOMETRY AT WORK
[7]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[8]  
Breiman L., 1984, Classification and regression trees, DOI DOI 10.1201/9781315139470
[9]   Combining instance selection methods based on data characterization: An approach to increase their effectiveness [J].
Caises, Yoel ;
Gonzalez, Antonio ;
Leyva, Enrique ;
Perez, Raul .
INFORMATION SCIENCES, 2011, 181 (20) :4780-4798
[10]  
Canu S., 2007, TRAINING INVARIANT S