Feature selection for online streaming high-dimensional data: A state-of-the-art review

被引:22
作者
Zaman, Ezzatul Akmal Kamaru [1 ]
Mohamed, Azlinah [2 ]
Ahmad, Azlin [1 ]
机构
[1] Univ Teknol MARA, Fac Comp & Math Sci, Shah Alam, Selangor, Malaysia
[2] Univ Teknol MARA UiTM, Inst Big Data Analyt & Artificial Intelligence IBD, Shah Alam, Selangor, Malaysia
关键词
Online feature selection; Streaming features; Feature relevancy; Feature redundancy; High-dimensional data; Feature drift; UNSUPERVISED FEATURE-SELECTION; LABEL FEATURE-SELECTION; TEXT FEATURE-SELECTION; CLASSIFICATION; ALGORITHM; DRIFT; REDUCTION; FILTER; SETS;
D O I
10.1016/j.asoc.2022.109355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge discovery for data streaming requires online feature selection to reduce the complexity of real-world datasets and significantly improve the learning process. This is achieved by selecting highly relevant subsets and minimising irrelevant and redundant features. However, researchers have difficulties in addressing various forms of data. The goal of this article is to present a state-of-the-art review of feature subset selection based on the data form for the high-dimensional data used in online streaming. Through a systematic literature review assessing journal and conference papers from the past five years, detailed discussions on traditional feature selection and online feature selection were presented. Subsequently, a taxonomy of the challenges related to OFS provides a comprehensive review of state-of-the-art OFS and the benchmark methods. Several data forms were identified based on the extensive review: group stream, multi-label, capricious, imbalance, and feature drift. Using critical analysis, the evaluation metrics of online feature selection methods were compared from the perspectives of threshold initialisation, accuracy, high dimensionality, running time, relevancy, and redundancy for the optimal feature subset. An online feature selection framework was derived to illustrate the relationship between the application area, data form, online feature selection methods, evaluation metrics, and tools. Finally, the findings and potential directions for future research were thoroughly discussed. It is suggested that future researchers explore the derived framework and aim to advance each method. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] The state-of-the-art on tours for dynamic visualization of high-dimensional data
    Lee, Stuart
    Cook, Dianne
    da Silva, Natalia
    Laa, Ursula
    Spyrison, Nicholas
    Wang, Earo
    Zhang, H. Sherry
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2022, 14 (04)
  • [2] A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis
    Borah, Kasmika
    Das, Himanish Shekhar
    Seth, Soumita
    Mallick, Koushik
    Rahaman, Zubair
    Mallik, Saurav
    FUNCTIONAL & INTEGRATIVE GENOMICS, 2024, 24 (05)
  • [3] On online high-dimensional spherical data clustering and feature selection
    Amayri, Ola
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
  • [4] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [5] Filter Feature Selection Performance Comparison in High-dimensional Data
    Huertas, Carlos
    Juarez-Ramirez, Reyes
    2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [6] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [7] Automated online feature selection and learning from high-dimensional streaming data using an ensemble of Kohonen neurons
    Roy, Asim
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [8] Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set
    Chen X.
    Lin Y.
    Wang C.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (08): : 726 - 735
  • [9] Online feature selection for high-dimensional class-imbalanced data
    Zhou, Peng
    Hu, Xuegang
    Li, Peipei
    Wu, Xindong
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199
  • [10] Feature selection in multimedia: The state-of-the-art review
    Lee, Pui Yi
    Loh, Wei Ping
    Chin, Jeng Feng
    IMAGE AND VISION COMPUTING, 2017, 67 : 29 - 42