Iterative Subset Selection for Feature Drifting Data Streams

被引:8
|
作者
Yuan, Lanqin [1 ]
Pfahringer, Bernhard [2 ]
Barddal, Jean Paul [3 ]
机构
[1] Univ Waikato, Hamilton, New Zealand
[2] Univ Auckland, Deparment Comp Sci, Auckland, New Zealand
[3] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
来源
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2018年
关键词
Data Stream Mining; Feature Selection; Concept Drift; Embedded Feature Selection; Iterative Subset Selection;
D O I
10.1145/3167132.3167188
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection has been studied and shown to improve classifier performance in standard batch data mining but is mostly unexplored in data stream mining. Feature selection becomes even more important when the relevant subset of features changes over time, as the underlying concept of a data stream drifts. This specific kind of drift is known as feature drift and requires specific techniques not only to determine which features are the most important but also to take advantage of them. This paper presents a novel method of feature subset selection specialized for dealing with the occurrence of feature drifts called Iterative Subset Selection (ISS), which splits the feature selection process into two stages by first ranking the features, and then iteratively selecting features from the ranking. Applying our feature selection method together with Naive Bayes or k-Nearest Neighbour as a classifier, results in compelling accuracy improvements, compared to prior work.
引用
收藏
页码:510 / 517
页数:8
相关论文
共 50 条
  • [1] Addressing Feature Drift in Data Streams Using Iterative Subset Selection
    Yuan, Lanqin
    Pfahringer, Bernhard
    Barddal, Jean Paul
    APPLIED COMPUTING REVIEW, 2019, 19 (01): : 20 - 33
  • [2] Boosting decision stumps for dynamic feature selection on data streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    Gomes, Heitor Murilo
    Bifet, Albert
    Pfahringer, Bernhard
    INFORMATION SYSTEMS, 2019, 83 : 13 - 29
  • [3] Decision tree-based Feature Ranking in Concept Drifting Data Streams
    Pereira Karax, Jean Antonio
    Malucelli, Andreia
    Barddal, Jean Paul
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 590 - 592
  • [4] A conservative feature subset selection algorithm with missing data
    Aussem, Alex
    de Morais, Sergio Rodrigues
    NEUROCOMPUTING, 2010, 73 (4-6) : 585 - 590
  • [5] Feature subset selection and ranking for data dimensionality reduction
    Wei, Hua-Liang
    Billings, Stephen A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (01) : 162 - 166
  • [6] Merit-guided dynamic feature selection filter for data streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    Gomes, Heitor Murilo
    Bifet, Albert
    Pfahringer, Bernhard
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 116 : 227 - 242
  • [7] Overview Of Feature Subset Selection Algorithm For High Dimensional Data
    Gandhi, Swati S.
    Prabhune, S. S.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 618 - 623
  • [8] Feature Selection on High Dimensional Data using Wrapper Based Subset Selection
    Manikandan, G.
    Susi, E.
    Abirami, S.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 320 - 325
  • [9] Wrappers for feature subset selection
    Kohavi, R
    John, GH
    ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 273 - 324
  • [10] Online Feature Screening for Data Streams With Concept Drift
    Wang, Mingyuan
    Barbu, Adrian
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11693 - 11707