Feature Selection for High-Dimensional Gene Expression Data: A Review

被引:0
作者
Baali, Sara [1 ]
Hamim, Mohammed [1 ]
Moutachaouik, Hicham [1 ]
Hain, Mustapha [1 ]
El Moudden, Ismail [2 ]
机构
[1] Univ Hassan 2, AICSE Lab, Casablanca, Morocco
[2] Eastern Virginia Med Sch, EVMS Sentara Healthcare Analyt & Delivery Sci Ins, Norfolk, VA USA
来源
SMART APPLICATIONS AND DATA ANALYSIS, SADASC 2024, PT I | 2024年 / 2167卷
关键词
Gene Expression Data; Feature Selection; Classification; MICROARRAY DATA; CANCER; CLASSIFICATION; PREDICTION;
D O I
10.1007/978-3-031-77040-1_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent decades, dealing with high-dimensional data has become an undeniable challenge in most data mining applications. In certain domains, such as bio-informatics, and specifically in microarray data analysis, exploring gene expression data often involves the use of tens of thousands of features (genes) measured across just a few dozen samples. Such scenarios, make the use of classical data mining tools a real challenge due to the involvement of a significant number of irrelevant or redundant genes. In response to this challenge, several approaches-based feature selection have been proposed, each with its advantages and disadvantages. This work introduces a classification of feature selection methods and also reviews the state-of-the-art approaches developed over the past five years. Our review has revealed a notable trend towards hybrid approaches, approximately 50% of the surveyed studies propose hybrid feature selection techniques, most frequently combining filter with wrapper methods. Additionally, the 10-fold cross-validation technique stands out as the dominant evaluation method, employed by 61.6% of surveyed approaches. Support Vector Machines emerge as the most favored classification algorithm, demonstrating optimal performance in 77.78% of cases. These findings contribute to the advancement of feature selection approaches, particularly in reducing the dimensionality of gene expression data, thereby enhancing cancer classification methodologies.
引用
收藏
页码:74 / 92
页数:19
相关论文
共 37 条
[1]   Hybridization of data-driven threshold algorithm with fuzzy particle swarm optimization technique for gene selection in microarray data [J].
Adebayo, Paul Olujide ;
Jimoh, Rasheed Gbenga ;
Yahya, Waheed Babatunde .
SCIENTIFIC AFRICAN, 2024, 23
[2]   A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data [J].
Akhavan, Motahare ;
Hasheminejad, Seyed Mohammad Hossein .
KNOWLEDGE-BASED SYSTEMS, 2023, 262
[3]   A novel gene selection algorithm for cancer classification using microarray datasets [J].
Alanni, Russul ;
Hou, Jingyu ;
Azzawi, Hasseeb ;
Xiang, Yong .
BMC MEDICAL GENOMICS, 2019, 12 (1)
[4]   Improved intelligent water drop-based hybrid feature selection method for data [J].
Alhenawi, Esra'a ;
Al-Sayyed, Rizik ;
Hudaib, Amjad ;
Mirjalili, Seyedali .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 103
[5]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[6]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[7]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[8]   Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE) [J].
Anowar, Farzana ;
Sadaoui, Samira ;
Selim, Bassant .
COMPUTER SCIENCE REVIEW, 2021, 40
[9]  
AUGENLICHT LH, 1987, CANCER RES, V47, P6017
[10]  
Aziz R., 2017, AIMS. Bioeng, V4, P179, DOI [DOI 10.3934/BIOENG.2017.1.179, 10.3934/bioeng.2017.1.179]