Subsampling the Concurrent AdaBoost Algorithm: An Efficient Approach for Large Datasets

被引:0
作者
Allende-Cid, Hector [1 ]
Acuna, Diego [2 ]
Allende, Hector [2 ,3 ]
机构
[1] Pontificia Univ Catolica Valparaiso, Avda Brasil 2241, Valparaiso, Chile
[2] Univ Tecn Federico Santa Maria, Avda Espana 1680, Valparaiso, Chile
[3] Univ Adolfo Ibanez, Padre Hurtado 750, Vina Del Mar, Chile
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016 | 2017年 / 10125卷
关键词
Concurrent AdaBoost; Subsampling; Classification; Machine Learning; Large data sets classification; CLASSIFICATION; VARIANTS;
D O I
10.1007/978-3-319-52277-7_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we propose a subsampled version of the Concurrent AdaBoost algorithm in order to deal with large datasets in an efficient way. The proposal is based on a concurrent computing approach focused on improving the distribution weight estimation in the algorithm, hence obtaining better capacity of generalization. On each round, we train in parallel several weak hypotheses, and using a weighted ensemble we update the distribution weights of the following boosting rounds. Instead of creating resamples of size equal to the original dataset, we subsample the datasets in order to obtain a speed-up in the training phase. We validate our proposal with different resampling sizes using 3 datasets, obtaining promising results and showing that the size of the resamples does not affect considerably the performance of the algorithm, but the execution time improves greatly.
引用
收藏
页码:318 / 325
页数:8
相关论文
共 13 条
  • [1] Allende-Cid H., 2015, INTELL DISTRIB COMPU, V616, P223
  • [2] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants
    Bauer, E
    Kohavi, R
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 105 - 139
  • [3] Aggregate features and ADABOOST for music classification
    Bergstra, James
    Casagrande, Norman
    Erhan, Dumitru
    Eck, Douglas
    Kegl, Balazs
    [J]. MACHINE LEARNING, 2006, 65 (2-3) : 473 - 484
  • [4] Parallel coordinate descent for the Adaboost problem
    Fercoq, Olivier
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 354 - 358
  • [5] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [6] Kuncheva LI, 2002, LECT NOTES COMPUT SC, V2364, P81
  • [7] Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions
    Liu, Hui
    Tian, Hong-qi
    Li, Yan-fei
    Zhang, Lei
    [J]. ENERGY CONVERSION AND MANAGEMENT, 2015, 92 : 67 - 81
  • [8] Markoski B, 2015, ACTA POLYTECH HUNG, V12, P189
  • [9] Mukherjee Indraneel, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P17, DOI 10.1007/978-3-642-40994-3_2
  • [10] Palit I., 2010, Proceedings 2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010), P1346, DOI 10.1109/ICDMW.2010.180