Fast data selection for SVM training using ensemble margin

被引:38
作者
Guo, Li [1 ,2 ]
Boukir, Samia [1 ]
机构
[1] Univ Bordeaux, IPB, G&E Lab EA 4592, F-33670 Pessac, France
[2] CNRS, IMS Lab, UMR 5218, F-33402 Talence, France
关键词
Boundary points; Instance selection; Ensemble learning; Margin theory; Large data; Support vector machine; SAMPLE; CLASSIFICATION;
D O I
10.1016/j.patrec.2014.08.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machine (SVM) is a powerful classification method. However, it suffers a major drawback: the high memory and time complexity of the training which constrains the application of SVM to large size classification tasks. To accelerate the SVM training, a new ensemble margin-based data selection approach is proposed. It relies on a simple and efficient heuristic to provide support vector candidates: selecting lowest margin instances. This technique significantly reduces the SVM training task complexity while maintaining the accuracy of the SVM classification. A fast alternative of our approach we called SVIS (Small Votes Instance Selection) with great potential for large data problem is also introduced. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:112 / 119
页数:8
相关论文
共 32 条
  • [1] Abe S., 2005, P 1 INT WORKSH MULT
  • [2] [Anonymous], THESIS U BORDEAUX FR
  • [3] Bache K., 2013, UCI Machine Learning Repository
  • [4] Bartlett P., 2000, ADV LARGE MARGIN CLA
  • [5] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [6] Boyang Li, 2009, Proceedings 2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta), P1784, DOI 10.1109/IJCNN.2009.5178618
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Pasting small votes for classification in large databases and on-line
    Breiman, L
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 85 - 103
  • [10] Bühlmann P, 2002, ANN STAT, V30, P927