Feature selection for online streaming high-dimensional data: A state-of-the-art review

被引：22

作者：

Zaman, Ezzatul Akmal Kamaru ^{[1
]}

Mohamed, Azlinah ^{[2
]}

Ahmad, Azlin ^{[1
]}

机构：

[1] Univ Teknol MARA, Fac Comp & Math Sci, Shah Alam, Selangor, Malaysia

[2] Univ Teknol MARA UiTM, Inst Big Data Analyt & Artificial Intelligence IBD, Shah Alam, Selangor, Malaysia

来源：

APPLIED SOFT COMPUTING | 2022年 / 127卷

关键词：

Online feature selection; Streaming features; Feature relevancy; Feature redundancy; High-dimensional data; Feature drift; UNSUPERVISED FEATURE-SELECTION; LABEL FEATURE-SELECTION; TEXT FEATURE-SELECTION; CLASSIFICATION; ALGORITHM; DRIFT; REDUCTION; FILTER; SETS;

D O I：

10.1016/j.asoc.2022.109355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge discovery for data streaming requires online feature selection to reduce the complexity of real-world datasets and significantly improve the learning process. This is achieved by selecting highly relevant subsets and minimising irrelevant and redundant features. However, researchers have difficulties in addressing various forms of data. The goal of this article is to present a state-of-the-art review of feature subset selection based on the data form for the high-dimensional data used in online streaming. Through a systematic literature review assessing journal and conference papers from the past five years, detailed discussions on traditional feature selection and online feature selection were presented. Subsequently, a taxonomy of the challenges related to OFS provides a comprehensive review of state-of-the-art OFS and the benchmark methods. Several data forms were identified based on the extensive review: group stream, multi-label, capricious, imbalance, and feature drift. Using critical analysis, the evaluation metrics of online feature selection methods were compared from the perspectives of threshold initialisation, accuracy, high dimensionality, running time, relevancy, and redundancy for the optimal feature subset. An online feature selection framework was derived to illustrate the relationship between the application area, data form, online feature selection methods, evaluation metrics, and tools. Finally, the findings and potential directions for future research were thoroughly discussed. It is suggested that future researchers explore the derived framework and aim to advance each method. (C) 2022 Elsevier B.V. All rights reserved.

引用

页数：27

共 50 条

[1] The state-of-the-art on tours for dynamic visualization of high-dimensional data
Lee, Stuart
Cook, Dianne
da Silva, Natalia
Laa, Ursula
Spyrison, Nicholas
Wang, Earo
Zhang, H. Sherry
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2022, 14 (04)
[2] A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis
Borah, Kasmika
Das, Himanish Shekhar
Seth, Soumita
Mallick, Koushik
Rahaman, Zubair
Mallik, Saurav
FUNCTIONAL & INTEGRATIVE GENOMICS, 2024, 24 (05)
[3] On online high-dimensional spherical data clustering and feature selection
Amayri, Ola
Bouguila, Nizar
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
[4] A filter feature selection for high-dimensional data
Janane, Fatima Zahra
Ouaderhman, Tayeb
Chamlal, Hasna
JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
[5] Filter Feature Selection Performance Comparison in High-dimensional Data
Huertas, Carlos
Juarez-Ramirez, Reyes
2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
[6] Feature selection for high-dimensional data
Bolón-Canedo V.
Sánchez-Maroño N.
Alonso-Betanzos A.
Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
[7] Automated online feature selection and learning from high-dimensional streaming data using an ensemble of Kohonen neurons
Roy, Asim
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[8] Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set
Chen X.
Lin Y.
Wang C.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (08): : 726 - 735
[9] Online feature selection for high-dimensional class-imbalanced data
Zhou, Peng
Hu, Xuegang
Li, Peipei
Wu, Xindong
KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199
[10] Feature selection in multimedia: The state-of-the-art review
Lee, Pui Yi
Loh, Wei Ping
Chin, Jeng Feng
IMAGE AND VISION COMPUTING, 2017, 67 : 29 - 42

← 1 2 3 4 5 →