FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

被引:12
|
作者
Xu, Zhaozhao [1 ]
Yang, Fangyuan [2 ]
Tang, Chaosheng [1 ]
Wang, Hong [2 ]
Wang, Shuihua [3 ,4 ]
Sun, Junding [1 ]
Zhang, Yudong [1 ,3 ,5 ]
机构
[1] Henan Polytech Univ, Sch Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China
[2] Henan Polytech Univ, Dept Gynecol Oncol, Affiliated Hosp 1, Jiaozuo 454000, Henan, Peoples R China
[3] Univ Leicester, Sch Comp & Math Sci, Leicester LE1 7RH, England
[4] Xian Jiaotong Liverpool Univ, Dept Biol Sci, Suzhou 215123, Jiangsu, Peoples R China
[5] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 英国医学研究理事会;
关键词
Gene expression data; Feature selection; Spectral clustering; Symmetric uncertainty; Multi-objective genetic algorithm; CLASSIFICATION; FRAMEWORK; MACHINE;
D O I
10.1016/j.eswa.2023.123069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG- HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    BMC BIOINFORMATICS, 2018, 19
  • [32] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [33] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [34] A hybrid Artificial Immune optimization for high-dimensional feature selection
    Zhu, Yongbin
    Li, Wenshan
    Li, Tao
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [35] An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data
    Zhu, Yongbin
    Li, Tao
    Li, Wenshan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [36] Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data
    Elnaz Pashaei
    Elham Pashaei
    Neural Computing and Applications, 2023, 35 : 353 - 374
  • [37] Optimal Bayesian Feature Selection on High Dimensional Gene Expression Data
    Pour, Ali Foroughi
    Dalton, Lori A.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1402 - 1405
  • [38] A New Evolutionary Multitasking Algorithm for High-Dimensional Feature Selection
    Liu, Ping
    Xu, Bangxin
    Xu, Wenwen
    IEEE ACCESS, 2024, 12 : 89856 - 89872
  • [39] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [40] The feature selection bias problem in relation to high-dimensional gene data
    Krawczuk, Jerzy
    Lukaszuk, Tomasz
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 66 : 63 - 71