Incremental Rebalancing Learning on Evolving Data Streams

被引:17
作者
Bernardo, Alessio [1 ]
Valle, Emanuele Della [1 ]
Bifet, Albert [2 ,3 ]
机构
[1] DEIB Politecn Milano, Milan, Italy
[2] Univ Waikato, Hamilton, New Zealand
[3] Telecom ParisTech, LTCI, Palaiseau, France
来源
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020) | 2020年
关键词
Evolving Data Stream; Streaming; Concept Drift; MOA; Balancing;
D O I
10.1109/ICDMW51313.2020.00121
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays, every device connected to the Internet generates an ever-growing (formally, unbounded) stream of data. Machine Learning on data streams is a grand challenge due to its resource constraints. Indeed, standard machine learning techniques are not able to deal with data whose statistics are subject to gradual or sudden changes (formally, concept drift) without any warning. Massive Online Analysis (MOA) is the collective name, as well as a software library, for new learners that can manage data streams. In this paper, we present a research study on streaming rebalancing. Indeed, data streams can be imbalanced as static data, but there is not a method to rebalance them incrementally. For this reason, we propose a new streaming approach able to rebalance data streams online. Our new methodology is evaluated against some synthetically generated datasets using prequential evaluation to demonstrate that it outperforms the existing approaches.
引用
收藏
页码:844 / 850
页数:7
相关论文
共 11 条
[1]  
Bifet Albert, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8188, P465, DOI 10.1007/978-3-642-40988-2_30
[2]   Efficient Online Evaluation of Big Data Stream Classifiers [J].
Bifet, Albert ;
Morales, Gianmarco De Francisci ;
Read, Jesse ;
Holmes, Geoff ;
Pfahringer, Bernhard .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :59-68
[3]  
Bifet A, 2009, LECT NOTES COMPUT SC, V5772, P249, DOI 10.1007/978-3-642-03915-7_22
[4]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[5]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]  
Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
[8]   Adaptive random forests for evolving data stream classification [J].
Gomes, Heitor M. ;
Bifet, Albert ;
Read, Jesse ;
Barddal, Jean Paul ;
Enembreck, Fabricio ;
Pfharinger, Bernhard ;
Holmes, Geoff ;
Abdessalem, Talel .
MACHINE LEARNING, 2017, 106 (9-10) :1469-1495
[9]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[10]  
Hulten G., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P97, DOI 10.1145/502512.502529