A comprehensive learning based swarm optimization approach for feature selection in gene expression data

被引:2
作者
Easwaran, Subha [1 ]
Venugopal, Jothi Prakash [2 ]
Subramanian, Arul Antran Vijay [3 ]
Sundaram, Gopikrishnan [4 ]
Naseeba, Beebi [4 ]
机构
[1] Karpagam Coll Engn, Dept Sci & Humanities, Coimbatore 641032, Tamil Nadu, India
[2] Karpagam Coll Engn, Dept Informat Technol, Coimbatore 641032, Tamil Nadu, India
[3] Karpagam Coll Engn, Dept Comp Sci & Engn, Coimbatore 641032, Tamil Nadu, India
[4] VIT AP Univ, Sch Comp Sci & Engn, Amaravathi 522241, Andhra Pradesh, India
关键词
Comprehensive learning; Feature selection; Gene expression; Gene selection; Swarm intelligence; Cancer classification; MICROARRAY; CLASSIFICATION;
D O I
10.1016/j.heliyon.2024.e37165
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression data analysis is challenging due to the high dimensionality and complexity of the data. Feature selection, which identifies relevant genes, is a common preprocessing step. We propose a Comprehensive Learning-Based Swarm Optimization (CLBSO) approach for feature selection in gene expression data. CLBSO leverages the strengths of ants and grasshoppers to efficiently explore the high-dimensional search space. Ants perform local search and leave pheromone trails to guide the swarm, while grasshoppers use their ability to jump long distances to explore new regions and avoid local optima. The proposed approach was evaluated on several publicly available gene expression datasets and compared with state-of-the-art feature selection methods. CLBSO achieved an average accuracy improvement of 15% over the original high-dimensional data and outperformed other feature selection methods by up to 10%. For instance, in the Pancreatic cancer dataset, CLBSO achieved 97.2% accuracy, significantly higher than XGBoost-MOGA's 84.0%. Convergence analysis showed CLBSO required fewer iterations to reach optimal solutions. Statistical analysis confirmed significant performance improvements, and stability analysis demonstrated consistent gene subset selection across different runs. These findings highlight the robustness and efficacy of CLBSO in handling complex gene expression datasets, making it a valuable tool for enhancing classification tasks in bioinformatics.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Feature selection based on niching particle swarm optimization for omics data classification
    Xu, Zhao
    Yang, Junshan
    2020 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND HUMAN-COMPUTER INTERACTION (ICHCI 2020), 2020, : 130 - 133
  • [42] A new distributed feature selection technique for classifying gene expression data
    Ayyad, Sarah M.
    Saleh, Ahmed, I
    Labib, Labib M.
    INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2019, 12 (04)
  • [43] Data mining for feature selection in gene expression autism data
    Latkowski, Tomasz
    Osowski, Stanislaw
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (02) : 864 - 872
  • [44] Breast cancer diagnosis using thermal image analysis: A data-driven approach based on swarm intelligence and supervised learning for optimized feature selection
    Macedo, Mariana
    Santana, Maira
    Santos, Wellington P. dos
    Menezes, Ronaldo
    Bastos-Filho, Carmelo
    APPLIED SOFT COMPUTING, 2021, 109
  • [45] Feature selection using particle swarm optimization-based logistic regression model
    Qasim, Omar Saber
    Algamal, Zakariya Yahya
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 182 : 41 - 46
  • [46] Robust microarray data feature selection using a correntropy based distance metric learning approach
    Vahabzadeh, Venus
    Moattar, Mohammad Hossein
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 161
  • [47] Review on the Usage of Swarm Intelligence in Gene Expression Data
    Zamri, Nurhawani Ahmad
    Thangavel, Bhuvaneswari
    Ab Aziz, Nor Azlina
    Aziz, Nor Hidayati Abdul
    2ND INTERNATIONAL CONFERENCE FOR INNOVATION IN BIOMEDICAL ENGINEERING AND LIFE SCIENCES, 2018, 67 : 153 - 160
  • [48] An Efficient Feature Selection Technique for Gene Expression Data
    Chandra, B.
    2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2018, : 132 - 137
  • [49] A Review on Feature Selection Techniques for Gene Expression Data
    Vanjimalar, S.
    Ramyachitra, D.
    Manikandan, P.
    2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC 2018), 2018, : 26 - 29
  • [50] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,