Concept drift detection and accelerated convergence of online learning

被引:6
作者
Guo, Husheng [1 ,2 ]
Li, Hai [1 ]
Sun, Ni [1 ]
Ren, Qiaoyan [1 ]
Zhang, Aijuan [1 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Streaming data; Concept drift; Authenticity; Model convergence; NEURAL-NETWORKS; DATA STREAMS; ENSEMBLE; CLASSIFICATION; MODELS;
D O I
10.1007/s10115-022-01790-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Streaming data has become an important form in the era of big data, and the concept drift, as one of the most important problem of it, is often studied deeply. However, similar to true concept drift, noise and too small training samples will also lead to the classification performance fluctuation, which is easy to confuse with true concept drift. To solve this problem, an improved concept drift detection method is proposed, and the accelerated convergence of the model after concept drift is also studied. Firstly, the effective fluctuation sites can be obtained by group detection method. Secondly, the authenticity of concept drift can be determined by tracking the testing accuracy of reference sites near the effective fluctuation site. Lastly, in the convergence acceleration stage, the time sequential distance is designed to measure the similarity of these sequential data blocks during different time periods, and the noncritical disturbance data with the largest time sequential distance are removed sequentially to improve the convergence speed of the model after concept drift occurs. The experimental results demonstrate that the proposed method not only produces better identification results in distinguishing true and false concept drift but also improves the convergence speed of the model.
引用
收藏
页码:1005 / 1043
页数:39
相关论文
共 50 条
[1]  
Aggarwal C.C., 2014, Data Classification: Algorithms and Applications, P245
[2]  
[Anonymous], 2003, Proceedings of the 2003 ACM International Conference on Knowledge Discovery and Data Mining, DOI 10.1145/956750.956778
[3]  
Baena-Garcia M, 2006, 4 ECML PKDD INT WORK
[4]   Improved concept drift handling in surgery prediction and other applications [J].
Beyene, Ayne A. ;
Welemariam, Tewelle ;
Persson, Marie ;
Lavesson, Niklas .
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) :177-196
[5]   The impact of data difficulty factors on classification of imbalanced and concept drifting data streams [J].
Brzezinski, Dariusz ;
Minku, Leandro L. ;
Pewinski, Tomasz ;
Stefanowski, Jerzy ;
Szumaczuk, Artur .
KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) :1429-1469
[6]   Prequential AUC: properties of the area under the ROC curve for data streams with concept drift [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 52 (02) :531-562
[7]   Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (01) :81-94
[8]   Latent Log-Linear Models for Handwritten Digit Classification [J].
Deselaers, Thomas ;
Gass, Tobias ;
Heigold, Georg ;
Ney, Hermann .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (06) :1105-1117
[9]   Learning in Nonstationary Environments: A Survey [J].
Ditzler, Gregory ;
Roveri, Manuel ;
Alippi, Cesare ;
Polikar, Robi .
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2015, 10 (04) :12-25
[10]   Incremental Learning of Concept Drift in Nonstationary Environments [J].
Elwell, Ryan ;
Polikar, Robi .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (10) :1517-1531