ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning

被引:67
作者
Abbasi, Ahmad [1 ]
Javed, Abdul Rehman [2 ]
Chakraborty, Chinmay [3 ]
Nebhen, Jamel [4 ]
Zehra, Wisha [1 ]
Jalil, Zunera [2 ]
机构
[1] Air Univ, Fac Comp & AI, Islamabad 44000, Pakistan
[2] Air Univ, Dept Cyber Secur, Islamabad 44000, Pakistan
[3] Birla Inst Technol, Dept Elect & Commun Engn, Ranchi 835215, Bihar, India
[4] Prince Sattam Bin Abdulaziz Univ, Coll Comp Sci & Engn, Al Kharj 11942, Saudi Arabia
关键词
Big Data; Machine learning; Light emitting diodes; Training; Data models; Standards; Licenses; Internet of Things; big data; smart concept drift; social data; online learning; ensemble learning; HETEROGENEOUS ENSEMBLE; ONLINE; CLASSIFIER;
D O I
10.1109/ACCESS.2021.3076264
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid increase in communication technologies and smart devices, an enormous surge in data traffic has been observed. A huge amount of data gets generated every second by different applications, users, and devices. This rapid generation of data has created the need for solutions to analyze the change in data over time in unforeseen ways despite resource constraints. These unforeseeable changes in the underlying distribution of streaming data over time are identified as concept drifts. This paper presents a novel approach named ElStream that detects concept drift using ensemble and conventional machine learning techniques using both real and artificial data. ElStream utilizes the majority voting technique making only optimum classifier to vote for decision. Experiments were conducted to evaluate the performance of the proposed approach. According to experimental analysis, the ensemble learning approach provides a consistent performance for both artificial and real-world data sets. Experiments prove that the ElStream provides better accuracy of 12.49%, 11.98%, 10.06%, 1.2%, and 0.33% for PokerHand, LED, Random RBF, Electricity, and SEA dataset respectively, which is better as compared to previous state-of-the-art studies and conventional machine learning algorithms.
引用
收藏
页码:66408 / 66419
页数:12
相关论文
共 32 条
[1]  
Abuassba AOM, 2017, TSINGHUA SCI TECHNOL, V22, P691
[2]  
Banerjee A., 2020, Handbook of data science approaches for biomedical engineering, P121, DOI [10.1016/B978-0-12-818318-2.00005-2, DOI 10.1016/B978-0-12-818318-2.00005-2]
[3]  
Basit A., 2020, 2020 IEEE 23 INT MUL, P1
[4]  
Blake C, 1998, UCI REPOSITORY MACHI
[5]  
Bojja GR, 2020, AMCIS 2020 PROCEEDINGS
[6]   Development and Application of Big Data Platform for Garlic Industry Chain [J].
Chen, Weijie ;
Feng, Guo ;
Zhang, Chao ;
Liu, Pingzeng ;
Ren, Wanming ;
Cao, Ning ;
Ding, Jianrui .
CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 58 (01) :229-248
[7]  
Deepa N., 2020, A survey on blockchain for big data: approaches, Opportunities, and Future Directions
[8]   A survey on ensemble learning [J].
Dong, Xibin ;
Yu, Zhiwen ;
Cao, Wenming ;
Shi, Yifan ;
Ma, Qianli .
FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (02) :241-258
[9]   Learning data streams online - An evolving fuzzy system approach with self-learning/adaptive thresholds [J].
Ge, Dongjiao ;
Zeng, Xiao-Jun .
INFORMATION SCIENCES, 2020, 507 :172-184
[10]   A Survey on Ensemble Learning for Data Stream Classification [J].
Gomes, Heitor Murilo ;
Barddal, Jean Paul ;
Enembreck, Fabricio ;
Bifet, Albert .
ACM COMPUTING SURVEYS, 2017, 50 (02)