EACD: evolutionary adaptation to concept drifts in data streams

被引:24
作者
Ghomeshi, Hossein [1 ]
Gaber, Mohamed Medhat [1 ]
Kovalchuk, Yevgeniya [1 ]
机构
[1] Birmingham City Univ, Sch Comp & Digital Technol, Birmingham, W Midlands, England
关键词
Data streams; Ensemble learning; Concept drifts; Evolutionary algorithms; Genetic algorithm; Non-stationary environments; RANDOM FORESTS;
D O I
10.1007/s10618-019-00614-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel ensemble learning method based on evolutionary algorithms to cope with different types of concept drifts in non-stationary data stream classification tasks. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner, especially in non-stationary environments, where data evolve over time. The evolution of data streams can be viewed as a problem of changing environment, and evolutionary algorithms offer a natural solution to this problem. The method proposed in this paper uses random subspaces of features from a pool of features to create different classification types in the ensemble. Each such type consists of a limited number of classifiers (decision trees) that have been built at different times over the data stream. An evolutionary algorithm (replicator dynamics) is used to adapt to different concept drifts; it allows the types with a higher performance to increase and those with a lower performance to decrease in size. Genetic algorithm is then applied to build a two-layer architecture based on the proposed technique to dynamically optimise the combination of features in each type to achieve a better adaptation to new concepts. The proposed method, called EACD, offers both implicit and explicit mechanisms to deal with concept drifts. A set of experiments employing four artificial and five real-world data streams is conducted to compare its performance with that of the state-of-the-art algorithms using the immediate and delayed prequential evaluation methods. The results demonstrate favourable performance of the proposed EACD method in different environments.
引用
收藏
页码:663 / 694
页数:32
相关论文
共 42 条
[1]  
[Anonymous], 1999, CUP KDD 1999 DATA
[2]  
Baena-Garcia M., 2006, 4 INT WORKSH KNOWL D, V6, P77
[3]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15
[4]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[5]  
Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139
[6]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[7]   Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables [J].
Blackard, JA ;
Dean, DJ .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) :131-151
[8]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32