A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

被引:1
作者
Halstead, Ben [1 ]
Koh, Yun Sing [1 ]
Riddle, Patricia [1 ]
Pechenizkiy, Mykola [2 ]
Bifet, Albert [3 ,4 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
[2] Eindhoven Univ Technol, Eindhoven, Netherlands
[3] Univ Waikato, Hamilton, New Zealand
[4] IP Paris, LTCI, Telecom Paris, Paris, France
来源
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2022年
关键词
Data Streams; Recurring Concepts; DYNAMIC CLASSIFIER SELECTION; CONCEPT DRIFT;
D O I
10.1109/DSAA54385.2022.10032368
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, e.g., when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state. The current state is continuously maintained using a Hoeffding bound based algorithm, which unlike existing methods, guarantees that every observation is classified using the state estimated as the most relevant, while also maintaining temporal stability. We find SELeCT is able to choose experience relevant to ground truth concepts with recall and precision above 0.9, significantly outperforming existing methods and close to a theoretical optimum, leading to significantly higher accuracy and enabling new opportunities for learning in complex changing conditions.
引用
收藏
页码:407 / 416
页数:10
相关论文
共 35 条
[1]   Modeling recurring concepts in data streams: a graph-based framework [J].
Ahmadi, Zahra ;
Kramer, Stefan .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 55 (01) :15-44
[2]   Just-In-Time Classifiers for Recurrent Concepts [J].
Alippi, Cesare ;
Boracchi, Giacomo ;
Roveri, Manuel .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (04) :620-634
[3]   Adapting dynamic classifier selection for concept drift [J].
Almeida, Paulo R. L. ;
Oliveira, Luiz S. ;
Britto, Alceu S., Jr. ;
Sabourin, Robert .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 :67-85
[4]  
Anderson Robert, 2016, AI 2016: Advances in Artificial Intelligence. 29th Australasian Joint Conference. Proceedings: LNAI 9992, P203, DOI 10.1007/978-3-319-50127-7_17
[5]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[6]   Modeling Concept Drift: A Probabilistic Graphical Model Based Approach [J].
Borchani, Hanen ;
Martinez, Ana M. ;
Masegosa, Andres R. ;
Langseth, Helge ;
Nielsen, Thomas D. ;
Salmeron, Antonio ;
Fernandez, Antonio ;
Madsen, Anders L. ;
Saez, Ramon .
ADVANCES IN INTELLIGENT DATA ANALYSIS XIV, 2015, 9385 :72-83
[7]  
Chen K, 2016, IEEE IJCNN, P780, DOI 10.1109/IJCNN.2016.7727279
[8]  
CHIU CW, 2018, 2018 INT JOINT C NEU, P1
[9]   Dynamic classifier selection: Recent advances and perspectives [J].
Cruz, Rafael M. O. ;
Sabourin, Robert ;
Cavalcanti, George D. C. .
INFORMATION FUSION, 2018, 41 :195-216
[10]  
DeLange M., 2021, P IEEECVF INT C COMP, P8250