An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data

被引:3
作者
Yang, Guicheng [1 ]
Li, Wei [2 ,3 ]
Xie, Weidong [1 ]
Wang, Linjie [1 ]
Yu, Kun [4 ]
机构
[1] Northeastern Univ, Coll Comp Sci & Engn, Shenyang 110000, Liaoning, Peoples R China
[2] Northeastern Univ, Key Lab Intelligent Comp Med Image MIIC, Minist Educ, Shenyang 110000, Liaoning, Peoples R China
[3] Natl Frontiers Sci Ctr Ind Intelligence & Syst Op, Shenyang 110819, Peoples R China
[4] Northeastern Univ, Coll Med & Bioinformat Engn, Shenyang 110819, Liaoning, Peoples R China
关键词
Microarray data; Feature selection; Clustering; Particle swarm optimization; Embedded feature elimination; FEATURE-SELECTION ALGORITHM; GENE SELECTION; MOLECULAR CLASSIFICATION; PREDICTION; DISCOVERY;
D O I
10.1016/j.cmpb.2023.107987
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and Objective: The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. Methods: In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. Results: We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. Conclusions: The hybrid feature selection method proposed in this paper helps address the issue of highdimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
引用
收藏
页数:17
相关论文
共 82 条
  • [1] New Gene Selection Method Using Gene Expression Programing Approach on Microarray Data Sets
    Alanni, Russul
    Hou, Jingyu
    Azzawi, Hasseeb
    Xiang, Yong
    [J]. COMPUTER AND INFORMATION SCIENCE (ICIS 2018), 2019, 791 : 17 - 31
  • [2] Feature selection methods on gene expression microarray data for cancer classification: A systematic review
    Alhenawi, Esra'a
    Al-Sayyed, Rizik
    Hudaib, Amjad
    Mirjalili, Seyedali
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 140
  • [3] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [4] Investigation on particle swarm optimisation for feature selection on high-dimensional data: local search and selection bias
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    Su Nguyen
    [J]. CONNECTION SCIENCE, 2016, 28 (03) : 270 - 294
  • [5] Blum Christian, 2008, P43, DOI 10.1007/978-3-540-74089-6_2
  • [6] Simultaneous variable weighting and determining the number of clusters A-weighted Gaussian means algorithm
    Chakraborty, Saptarshi
    Das, Swagatam
    [J]. STATISTICS & PROBABILITY LETTERS, 2018, 137 : 148 - 156
  • [7] Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data
    Chen, Kun-Huang
    Wang, Kung-Jeng
    Wang, Kung-Min
    Angelia, Melani-Adrian
    [J]. APPLIED SOFT COMPUTING, 2014, 24 : 773 - 780
  • [8] Chen RJ, 2022, IEEE T MED IMAGING, V41, P757, DOI [10.1109/TMI.2020.3021387, 10.1109/TITS.2020.3030218]
  • [9] Chormunge Smita, 2018, Journal of Electrical Systems and Information Technology, V5, P542, DOI 10.1016/j.jesit.2017.06.004
  • [10] Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts
    Dashtban, M.
    Balafar, Mohammadali
    [J]. GENOMICS, 2017, 109 (02) : 91 - 107