Optimizing the Hybrid Feature Selection in the DNA Microarray for Cancer Diagnosis Using Fuzzy Entropy and the Giza Pyramid Construction Algorithm

被引:0
作者
Motevalli, Masoumeh [1 ]
Khalilian, Madjid [1 ]
Bastanfard, Azam [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Karaj Branch, Karaj, Iran
关键词
Cancer diagnosis; microarray data; gene representation; feature selection; metaheuristics; fuzzy entropy; GENE-EXPRESSION DATA; CLASSIFICATION; SEARCH; OPTIMIZATION;
D O I
10.1142/S1469026824500317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biotechnological analysis of DNA microarray genes provides valuable insights into the discovery and treatment of diseases such as cancer. It may also be crucial for the prevention and treatment of other genetic diseases. However, due to the large number of features and dimensions in a DNA microarray, the "curse of dimensions" problem is very common. Many machine learning methods require an effective subset of input genes to achieve high accuracy. Unfortunately, extracting features (genes) is an inherently NP-hard problem. Recently, the use of metaheuristics to overcome the NP-hardness of the feature extraction problem has attracted the attention of many researchers. In this paper, we use the combination of fuzzy entropy and Giza Pyramid Construction (GPC) for feature selection. First, redundant features in the microarray dataset are removed using the fuzzy entropy approach. GPC is then used to reduce the execution time. This results in the selection of a near-optimal subset of genes for cancer detection. Dimensionality reduction with GPC followed by classification with Convolutional Neural Network (CNN) creates a synergy to increase efficiency. The proposed method is tested on five well-known cancer patient datasets: leukemia, lymphoma, MLL, ovarian, and SRBCT. The performance of CNN was also measured with four well-known classifiers, including K-nearest neighbor, na & iuml;ve Bayesian, decision tree, and logistic regression. Our results show that, on average, CNN has the highest accuracy, recall, precision, and F-measure in all datasets.
引用
收藏
页数:33
相关论文
共 75 条
  • [1] A Novel Neural Network Method for Face Recognition With a Single Sample Per Person
    Abdelmaksoud, Mohamed
    Nabil, Emad
    Farag, Ibrahim
    Hameed, Hala Abdel
    [J]. IEEE ACCESS, 2020, 8 : 102212 - 102221
  • [2] Stable feature selection based on probability estimation in gene expression datasets
    Ahmadi, Melika
    Mahmoodian, Hamid
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [3] A TRIZ-inspired bat algorithm for gene selection in cancer classification
    Al-Betar, Mohammed Azmi
    Alomari, Osama Ahmad
    Abu-Romman, Saeid M.
    [J]. GENOMICS, 2020, 112 (01) : 114 - 126
  • [4] A novel gene selection algorithm for cancer classification using microarray datasets
    Alanni, Russul
    Hou, Jingyu
    Azzawi, Hasseeb
    Xiang, Yong
    [J]. BMC MEDICAL GENOMICS, 2019, 12 (1)
  • [5] A Comprehensive Survey of Recent Hybrid Feature Selection Methods in Cancer Microarray Gene Expression Data
    Almazrua, Halah
    Alshamlan, Hala
    [J]. IEEE ACCESS, 2022, 10 : 71427 - 71449
  • [6] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
    Almugren, Nada
    Alshamlan, Hala
    [J]. IEEE ACCESS, 2019, 7 : 78533 - 78548
  • [7] Alshamlan H., 2014, LECT NOTES ELECT ENG, P389, DOI DOI 10.1007/978-981-4585-18-7_44
  • [8] Amniouel Soukaina, 2024, BioMedInformatics, V4, P1396, DOI 10.3390/biomedinformatics4020077
  • [9] [Anonymous], 2019, EVOL INTELL
  • [10] DNA microarrays to identify etiological agents, as sensors of environmental wellbeing
    Arena-Ortiz, Maria Leticia
    Sanchez-Rodriguez, Ernesto Cuauhtemoc
    Apodaca-Hernandez, Javier Eduardo
    Ortiz-Alcantara, Joanna Maria
    Rios-Contreras, Karen
    Chiappa-Carrara, Xavier
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2023, 11