Optimizing the Hybrid Feature Selection in the DNA Microarray for Cancer Diagnosis Using Fuzzy Entropy and the Giza Pyramid Construction Algorithm

被引:0
作者
Motevalli, Masoumeh [1 ]
Khalilian, Madjid [1 ]
Bastanfard, Azam [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Karaj Branch, Karaj, Iran
关键词
Cancer diagnosis; microarray data; gene representation; feature selection; metaheuristics; fuzzy entropy; GENE-EXPRESSION DATA; CLASSIFICATION; SEARCH; OPTIMIZATION;
D O I
10.1142/S1469026824500317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biotechnological analysis of DNA microarray genes provides valuable insights into the discovery and treatment of diseases such as cancer. It may also be crucial for the prevention and treatment of other genetic diseases. However, due to the large number of features and dimensions in a DNA microarray, the "curse of dimensions" problem is very common. Many machine learning methods require an effective subset of input genes to achieve high accuracy. Unfortunately, extracting features (genes) is an inherently NP-hard problem. Recently, the use of metaheuristics to overcome the NP-hardness of the feature extraction problem has attracted the attention of many researchers. In this paper, we use the combination of fuzzy entropy and Giza Pyramid Construction (GPC) for feature selection. First, redundant features in the microarray dataset are removed using the fuzzy entropy approach. GPC is then used to reduce the execution time. This results in the selection of a near-optimal subset of genes for cancer detection. Dimensionality reduction with GPC followed by classification with Convolutional Neural Network (CNN) creates a synergy to increase efficiency. The proposed method is tested on five well-known cancer patient datasets: leukemia, lymphoma, MLL, ovarian, and SRBCT. The performance of CNN was also measured with four well-known classifiers, including K-nearest neighbor, na & iuml;ve Bayesian, decision tree, and logistic regression. Our results show that, on average, CNN has the highest accuracy, recall, precision, and F-measure in all datasets.
引用
收藏
页数:33
相关论文
共 75 条
  • [71] Feature selection for multi-label learning based on variable-degree multi-granulation decision-theoretic rough sets
    Yu, Ying
    Wan, Ming
    Qian, Jin
    Miao, Duoqian
    Zhang, Zhiqiang
    Zhao, Pengfei
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2024, 169
  • [72] Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification
    Zhang, Hengyi
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [73] Feature selection using fuzzy-neighborhood relative decision entropy with class-level priority fusion
    Zhang, Xianyong
    Wang, Qian
    Fan, Yunrui
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (06) : 9527 - 9544
  • [74] Intelligent Control of Multilegged Robot Smooth Motion: A Review
    Zhao, Yongyong
    Wang, Jinghua
    Cao, Guohua
    Yuan, Yi
    Yao, Xu
    Qi, Luqiang
    [J]. IEEE ACCESS, 2023, 11 : 86645 - 86685
  • [75] Zhen W., IMETAE197