Feature Selection for High-Dimensional Gene Expression Data: A Review

被引：0

作者：

Baali, Sara ^{[1
]}

Hamim, Mohammed ^{[1
]}

Moutachaouik, Hicham ^{[1
]}

Hain, Mustapha ^{[1
]}

El Moudden, Ismail ^{[2
]}

机构：

[1] Univ Hassan 2, AICSE Lab, Casablanca, Morocco

[2] Eastern Virginia Med Sch, EVMS Sentara Healthcare Analyt & Delivery Sci Ins, Norfolk, VA USA

来源：

SMART APPLICATIONS AND DATA ANALYSIS, SADASC 2024, PT I | 2024年 / 2167卷

关键词：

Gene Expression Data; Feature Selection; Classification; MICROARRAY DATA; CANCER; CLASSIFICATION; PREDICTION;

D O I：

10.1007/978-3-031-77040-1_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent decades, dealing with high-dimensional data has become an undeniable challenge in most data mining applications. In certain domains, such as bio-informatics, and specifically in microarray data analysis, exploring gene expression data often involves the use of tens of thousands of features (genes) measured across just a few dozen samples. Such scenarios, make the use of classical data mining tools a real challenge due to the involvement of a significant number of irrelevant or redundant genes. In response to this challenge, several approaches-based feature selection have been proposed, each with its advantages and disadvantages. This work introduces a classification of feature selection methods and also reviews the state-of-the-art approaches developed over the past five years. Our review has revealed a notable trend towards hybrid approaches, approximately 50% of the surveyed studies propose hybrid feature selection techniques, most frequently combining filter with wrapper methods. Additionally, the 10-fold cross-validation technique stands out as the dominant evaluation method, employed by 61.6% of surveyed approaches. Support Vector Machines emerge as the most favored classification algorithm, demonstrating optimal performance in 77.78% of cases. These findings contribute to the advancement of feature selection approaches, particularly in reducing the dimensionality of gene expression data, thereby enhancing cancer classification methodologies.

引用

页码：74 / 92

页数：19

共 37 条

[1] Hybridization of data-driven threshold algorithm with fuzzy particle swarm optimization technique for gene selection in microarray data [J].