FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

被引:12
|
作者
Xu, Zhaozhao [1 ]
Yang, Fangyuan [2 ]
Tang, Chaosheng [1 ]
Wang, Hong [2 ]
Wang, Shuihua [3 ,4 ]
Sun, Junding [1 ]
Zhang, Yudong [1 ,3 ,5 ]
机构
[1] Henan Polytech Univ, Sch Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China
[2] Henan Polytech Univ, Dept Gynecol Oncol, Affiliated Hosp 1, Jiaozuo 454000, Henan, Peoples R China
[3] Univ Leicester, Sch Comp & Math Sci, Leicester LE1 7RH, England
[4] Xian Jiaotong Liverpool Univ, Dept Biol Sci, Suzhou 215123, Jiangsu, Peoples R China
[5] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 英国医学研究理事会;
关键词
Gene expression data; Feature selection; Spectral clustering; Symmetric uncertainty; Multi-objective genetic algorithm; CLASSIFICATION; FRAMEWORK; MACHINE;
D O I
10.1016/j.eswa.2023.123069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG- HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Multistage feature selection approach for high-dimensional cancer data
    Alkuhlani, Alhasan
    Nassef, Mohammad
    Farag, Ibrahim
    SOFT COMPUTING, 2017, 21 (22) : 6895 - 6906
  • [42] Genetic Programming for Feature Selection and Construction to High-Dimensional Data
    Ma, Jianbin
    Zhu, Man
    2024 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND INTELLIGENT SYSTEMS ENGINEERING, MLISE 2024, 2024, : 196 - 200
  • [43] A hybrid feature selection algorithm for microarray data
    Zheng, Yuefeng
    Li, Ying
    Wang, Gang
    Chen, Yupeng
    Xu, Qian
    Fan, Jiahao
    Cui, Xueting
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (05) : 3494 - 3526
  • [44] A fast dual-module hybrid high-dimensional feature selection algorithm
    Yang, Geying
    He, Junjiang
    Lan, Xiaolong
    Li, Tao
    Fang, Wenbo
    INFORMATION SCIENCES, 2024, 681
  • [45] Diagonal Discriminant Analysis With Feature Selection for High-Dimensional Data
    Romanes, Sarah E.
    Ormerod, John T.
    Yang, Jean Y. H.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 114 - 127
  • [46] Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data
    Pashaei, Elham
    Pashaei, Elnaz
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (13) : 15598 - 15637
  • [47] A hybrid feature selection approach based on ensemble method for high-dimensional data
    Rouhi, Amirreza
    Nezamabadi-pour, Hossein
    2017 2ND CONFERENCE ON SWARM INTELLIGENCE AND EVOLUTIONARY COMPUTATION (CSIEC), 2017, : 16 - 20
  • [48] Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data
    Elham Pashaei
    Elnaz Pashaei
    The Journal of Supercomputing, 2022, 78 : 15598 - 15637
  • [49] A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
    Feng, Fang
    Lv, Qingquan
    Wang, Mingsong
    Yang, Xuhui
    Zhou, Qingguo
    Zhou, Rui
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 41 - 46
  • [50] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175