A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning

被引:22
作者
Lv, Yanxia [1 ]
Peng, Sancheng [2 ,3 ]
Yuan, Ying [1 ]
Wang, Cong [1 ]
Yin, Pengfei [4 ]
Liu, Jiemin [1 ]
Wang, Cuirong [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
[2] Guangdong Univ Foreign Studies, Lab Language Engn & Comp, Guangzhou 510006, Guangdong, Peoples R China
[3] Guangdong Univ Foreign Studies, Sch Cyber Secur, Guangzhou 510006, Guangdong, Peoples R China
[4] Cent South Univ, Sch Informat Sci & Engn, Changsha 410083, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
big data stream; classification; online bagging; ensemble learning; concept drift;
D O I
10.26599/TST.2018.9010119
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
By combining multiple weak learners with concept drift in the classification of big data stream learning, the ensemble learning can achieve better generalization performance than the single learning approach. In this paper, we present an efficient classifier using the online bagging ensemble method for big data stream learning. In this classifier, we introduce an efficient online resampling mechanism on the training instances, and use a robust coding method based on error-correcting output codes. This is done in order to reduce the effects of correlations between the classifiers and increase the diversity of the ensemble. A dynamic updating model based on classification performance is adopted to reduce the unnecessary updating operations and improve the efficiency of learning. We implement a parallel version of EoBag, which runs faster than the serial version, and results indicate that the classification performance is almost the same as the serial one. Finally, we compare the performance of classification and the usage of resources with other state-of-the-art algorithms using the artificial and the actual data sets, respectively. Results show that the proposed algorithm can obtain better accuracy and more feasible usage of resources for the classification of big data stream.
引用
收藏
页码:379 / 388
页数:10
相关论文
共 27 条
[1]  
Abuassba AOM, 2017, TSINGHUA SCI TECHNOL, V22, P691
[2]  
[Anonymous], 2012, Technometrics
[3]   A survey on feature drift adaptation: Definition, benchmark, challenges and future directions [J].
Barddal, Jean Paul ;
Gomes, Heitor Murilo ;
Enembreck, Fabricio ;
Pfahringer, Bernhard .
JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 :278-294
[4]   Efficient Online Evaluation of Big Data Stream Classifiers [J].
Bifet, Albert ;
Morales, Gianmarco De Francisci ;
Read, Jesse ;
Holmes, Geoff ;
Pfahringer, Bernhard .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :59-68
[5]  
Bifet A, 2010, JMLR WORKSH CONF PRO, V11, P44
[6]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[9]   Combining block-based and online methods in learning ensembles from concept drifting data streams [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
INFORMATION SCIENCES, 2014, 265 :50-67
[10]  
Dietterich TG, 1994, J ARTIF INTELL RES, V2, P263