Boosting decision stumps for dynamic feature selection on data streams

被引:28
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
Gomes, Heitor Murilo [2 ]
Bifet, Albert [2 ]
Pfahringer, Bernhard [3 ]
机构
[1] Pontificia Univ Catolica Parana, Grad Program Informat PPGIa, Curitiba, Parana, Brazil
[2] Univ Paris Saclay, Inst Mines Telecom, Telecom ParisTech, INFRES, Paris, France
[3] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
关键词
Data stream mining; Feature selection; Concept drift; Feature drift; ONLINE; CLASSIFICATION; MACHINE; DRIFT;
D O I
10.1016/j.is.2019.02.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection targets the identification of which features of a dataset are relevant to the learning task. It is also widely known and used to improve computation times, reduce computation requirements, and to decrease the impact of the curse of dimensionality and enhancing the generalization rates of classifiers. In data streams, classifiers shall benefit from all the items above, but more importantly, from the fact that the relevant subset of features may drift over time. In this paper, we propose a novel dynamic feature selection method for data streams called Adaptive Boosting for Feature Selection (ABFS). ABFS chains decision stumps and drift detectors, and as a result, identifies which features are relevant to the learning task as the stream progresses with reasonable success. In addition to our proposed algorithm, we bring feature selection-specific metrics from batch learning to streaming scenarios. Next, we evaluate ABFS according to these metrics in both synthetic and real-world scenarios. As a result, ABFS improves the classification rates of different types of learners and eventually enhances computational resources usage. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 29
页数:17
相关论文
共 59 条
[1]   DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].
AGRAWAL, R ;
IMIELINSKI, T ;
SWAMI, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925
[2]  
[Anonymous], 2017, P S APPL COMP
[3]  
Appel Ron, 2017, P INT C MACH LEARN, P186
[4]   A survey on feature drift adaptation: Definition, benchmark, challenges and future directions [J].
Barddal, Jean Paul ;
Gomes, Heitor Murilo ;
Enembreck, Fabricio ;
Pfahringer, Bernhard .
JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 :278-294
[5]  
Barddal JP, 2016, LECT NOTES COMPUTER
[6]   Application of high-dimensional feature selection: evaluation for genomic prediction in man [J].
Bermingham, M. L. ;
Pong-Wong, R. ;
Spiliopoulou, A. ;
Hayward, C. ;
Rudan, I. ;
Campbell, H. ;
Wright, A. F. ;
Wilson, J. F. ;
Agakov, F. ;
Navarro, P. ;
Haley, C. S. .
SCIENTIFIC REPORTS, 2015, 5
[7]   Efficient Online Evaluation of Big Data Stream Classifiers [J].
Bifet, Albert ;
Morales, Gianmarco De Francisci ;
Read, Jesse ;
Holmes, Geoff ;
Pfahringer, Bernhard .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :59-68
[8]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[9]  
Bifet A, 2009, LECT NOTES COMPUT SC, V5772, P249, DOI 10.1007/978-3-642-03915-7_22
[10]  
Bifet Albert, 2007, Proceedings of the 7th SIAM International Conference on Data Mining