Optimizing the Hybrid Feature Selection in the DNA Microarray for Cancer Diagnosis Using Fuzzy Entropy and the Giza Pyramid Construction Algorithm

被引：0

作者：

Motevalli, Masoumeh ^{[1
]}

Khalilian, Madjid ^{[1
]}

Bastanfard, Azam ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Comp Engn, Karaj Branch, Karaj, Iran

来源：

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS | 2025年 / 24卷 / 01期

关键词：

Cancer diagnosis; microarray data; gene representation; feature selection; metaheuristics; fuzzy entropy; GENE-EXPRESSION DATA; CLASSIFICATION; SEARCH; OPTIMIZATION;

D O I：

10.1142/S1469026824500317

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Biotechnological analysis of DNA microarray genes provides valuable insights into the discovery and treatment of diseases such as cancer. It may also be crucial for the prevention and treatment of other genetic diseases. However, due to the large number of features and dimensions in a DNA microarray, the "curse of dimensions" problem is very common. Many machine learning methods require an effective subset of input genes to achieve high accuracy. Unfortunately, extracting features (genes) is an inherently NP-hard problem. Recently, the use of metaheuristics to overcome the NP-hardness of the feature extraction problem has attracted the attention of many researchers. In this paper, we use the combination of fuzzy entropy and Giza Pyramid Construction (GPC) for feature selection. First, redundant features in the microarray dataset are removed using the fuzzy entropy approach. GPC is then used to reduce the execution time. This results in the selection of a near-optimal subset of genes for cancer detection. Dimensionality reduction with GPC followed by classification with Convolutional Neural Network (CNN) creates a synergy to increase efficiency. The proposed method is tested on five well-known cancer patient datasets: leukemia, lymphoma, MLL, ovarian, and SRBCT. The performance of CNN was also measured with four well-known classifiers, including K-nearest neighbor, na & iuml;ve Bayesian, decision tree, and logistic regression. Our results show that, on average, CNN has the highest accuracy, recall, precision, and F-measure in all datasets.

引用

页数：33

共 75 条

[71] Feature selection for multi-label learning based on variable-degree multi-granulation decision-theoretic rough sets
Yu, Ying
Wan, Ming
Qian, Jin
Miao, Duoqian
Zhang, Zhiqiang
Zhao, Pengfei
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2024, 169
[72] Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification
Zhang, Hengyi
[J]. FRONTIERS IN GENETICS, 2021, 12
[73] Feature selection using fuzzy-neighborhood relative decision entropy with class-level priority fusion
Zhang, Xianyong
Wang, Qian
Fan, Yunrui
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (06) : 9527 - 9544
[74] Intelligent Control of Multilegged Robot Smooth Motion: A Review
Zhao, Yongyong
Wang, Jinghua
Cao, Guohua
Yuan, Yi
Yao, Xu
Qi, Luqiang
[J]. IEEE ACCESS, 2023, 11 : 86645 - 86685
[75] Zhen W., IMETAE197

← 1 2 3 4 5 6 7 8 →