Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams

被引:27
作者
Alberghini, Gavin [1 ]
Barbon, Sylvio, Jr. [2 ]
Cano, Alberto [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
[2] Univ Trieste, Dept Engn & Architecture, Trieste, Italy
关键词
Multi-label stream; Ensemble learning; Data stream; Concept drift; BINARY RELEVANCE; CLASSIFICATION; CLASSIFIERS; MODELS;
D O I
10.1016/j.neucom.2022.01.075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label clas-sifier. The properties of the stream may continuously change due to concept drift. Therefore, algorithms must constantly adapt to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Adaptive Ensemble of Self-Adjusting Nearest Neighbor Subspaces (AESAKNNS). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensem-ble, each base classifier is given a unique subset of features and samples to train on. These samples are distributed to classifiers in a probabilistic manner that follows a Poisson distribution as in online bagging. Accompanying these mechanisms, a collection of ADWIN detectors monitor each classifier for the occur-rence of a concept drift on the subspace. Upon detection, the algorithm automatically trains additional classifiers in the background to attempt to capture new concepts on new subspaces of features. The dynamic classifier selection chooses the most accurate classifiers from the active and background ensem-bles to replace the current ensemble. Our experimental study compares the proposed approach with 30 other classifiers, including problem transformation, algorithm adaptation, kNNs, and ensembles on 30 diverse multi-label datasets and 12 performance metrics. Results, validated using non-parametric statis-tical analysis, support the better performance of the AESAKNNS and highlight the contribution of its com-ponents in improving the performance of the ensemble. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:228 / 248
页数:21
相关论文
共 62 条
[1]  
Baena-Garcia M, 2006, 4 INT WORKSH KNOWL D, V6, P77, DOI DOI 10.1007/978-3-642-23857-4_12
[2]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15
[3]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[4]  
Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139
[5]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[6]  
Bifet Albert., 2013, Proceedings of the 28th annual ACM symposium on applied computing, P801, DOI DOI 10.1145/2480362.2480516
[7]   Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (01) :81-94
[8]   Kappa Updated Ensemble for drifting data stream mining [J].
Cano, Alberto ;
Krawczyk, Bartosz .
MACHINE LEARNING, 2020, 109 (01) :175-218
[9]  
Cerri R., 2020, ARXIV PREPRINT ARXIV
[10]   A new Self-Organizing Map based Algorithm for Multi-label Stream Classification [J].
Cerri, Ricardo ;
Junior, Joel David C. ;
Faria, Elaine. R. ;
Gama, Joao .
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, :418-426