Data selection based on decision tree for SVM classification on large data sets

被引:57
|
作者
Cervantes, Jair [1 ]
Garcia Lamont, Farid [1 ]
Lopez-Chau, Asdrubal [2 ]
Rodriguez Mazahua, Lisbeth [3 ]
Sergio Ruiz, J. [1 ]
机构
[1] CU UAEM Texcoco, Fracc El Tejocote, Texcoco, Mexico
[2] CU UAEM Zumpango, Zumpango 55600, Estado de Mexic, Mexico
[3] Inst Tecnol Orizaba, Div Res & Postgrad Studies, Orizaba 9432, Veracruz, Mexico
关键词
SVM; Classification; Large data sets; SUPPORT VECTOR MACHINES; ALGORITHM; PROPERTY;
D O I
10.1016/j.asoc.2015.08.048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:787 / 798
页数:12
相关论文
共 50 条
  • [21] Applications of data mining approach based on rough sets and decision tree
    Wu, Cheng-Dong
    Xu, Ke
    Zhang, Hai-Bo
    Liu, Jian-Shun
    Li, Yang
    Shenyang Jianzhu Daxue Xuebao (Ziran Kexue Ban)/Journal of Shenyang Jianzhu University (Natural Science), 2005, 21 (04): : 386 - 389
  • [22] Decision tree's pruning algorithm based on deficient data sets
    Zhang, Y
    Chi, ZX
    Wang, DG
    PDCAT 2005: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, Proceedings, 2005, : 1030 - 1032
  • [23] PSO-Based Method for SVM Classification on Skewed Data-Sets
    Cervantes, Jair
    Garcia-Lamont, Farid
    Lopez, Asdrubal
    Rodriguez, Lisbeth
    Ruiz Castilla, Jose S.
    Trueba, Adrian
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 79 - 86
  • [24] Distributed Multi Class SVM for Large Data Sets
    Govada, Aruna
    Gauri, Bhavul
    Sahay, S. K.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 54 - 58
  • [25] Accelerating the SVM learning for very large data sets
    Sung, Eric
    Yan, Zhu
    Li Xuchun
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 484 - +
  • [26] CART Decision Tree Combined with Boruta Feature Selection for Medical Data Classification
    Tang, Rong
    Zhang, Xiaojun
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 80 - 84
  • [27] On node selection for classification in correlated data sets
    Cristescu, Razvan
    2008 42ND ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, VOLS 1-3, 2008, : 1064 - 1068
  • [28] Improving SVM Classification on Imbalanced Data Sets in Distance Spaces
    Koeknar-Tezel, Suzan
    Latecki, Longin Jan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 259 - +
  • [29] Research on Grain Information Classification based on SVM Decision Tree
    Geng, Ruihuan
    Zhang, Dexian
    Chai, Jiajia
    2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 138 - 141
  • [30] SVM-based local search for gene selection and classification of microarray data
    Hernandez, Jose Crispin Hernandez
    Duval, Beatrice
    Hao, Jin-Kao
    BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 13 : 499 - 508