FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

被引:12
|
作者
Xu, Zhaozhao [1 ]
Yang, Fangyuan [2 ]
Tang, Chaosheng [1 ]
Wang, Hong [2 ]
Wang, Shuihua [3 ,4 ]
Sun, Junding [1 ]
Zhang, Yudong [1 ,3 ,5 ]
机构
[1] Henan Polytech Univ, Sch Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China
[2] Henan Polytech Univ, Dept Gynecol Oncol, Affiliated Hosp 1, Jiaozuo 454000, Henan, Peoples R China
[3] Univ Leicester, Sch Comp & Math Sci, Leicester LE1 7RH, England
[4] Xian Jiaotong Liverpool Univ, Dept Biol Sci, Suzhou 215123, Jiangsu, Peoples R China
[5] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 英国医学研究理事会;
关键词
Gene expression data; Feature selection; Spectral clustering; Symmetric uncertainty; Multi-objective genetic algorithm; CLASSIFICATION; FRAMEWORK; MACHINE;
D O I
10.1016/j.eswa.2023.123069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG- HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [2] Benchmark of filter methods for feature selection in high-dimensional gene expression survival data
    Bommert, Andrea
    Welchowski, Thomas
    Schmid, Matthias
    Rahnenfuehrer, Joerg
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [3] Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data
    Qin, Xiwen
    Zhang, Siqi
    Dong, Xiaogang
    Shi, Hongyu
    Yuan, Liping
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (09): : 13005 - 13027
  • [4] Hybrid fast unsupervised feature selection for high-dimensional data
    Manbari, Zhaleh
    AkhlaghianTab, Fardin
    Salavati, Chiman
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 97 - 118
  • [5] A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis
    Borah, Kasmika
    Das, Himanish Shekhar
    Seth, Soumita
    Mallick, Koushik
    Rahaman, Zubair
    Mallik, Saurav
    FUNCTIONAL & INTEGRATIVE GENOMICS, 2024, 24 (05)
  • [6] Hybrid Feature Selection for High-Dimensional Manufacturing Data
    Sun, Yajuan
    Yu, Jianlin
    Li, Xiang
    Wu, Ji Yan
    Lu, Wen Feng
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [7] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145
  • [8] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [9] FACO: A Novel Hybrid Feature Selection Algorithm for High-Dimensional Data Classification
    Popoola, Gideon
    Oyeniran, Kayode
    SOUTHEASTCON 2024, 2024, : 61 - 68
  • [10] A hybrid feature selection algorithm for gene expression data classification
    Lu, Huijuan
    Chen, Junying
    Yan, Ke
    Jin, Qun
    Xue, Yu
    Gao, Zhigang
    NEUROCOMPUTING, 2017, 256 : 56 - 62