FRL: An Integrative Feature Selection Algorithm Based on the Fisher Score, Recursive Feature Elimination, and Logistic Regression to Identify Potential Genomic Biomarkers

被引:3
|
作者
Ge, Chenyu [1 ]
Luo, Liqun [2 ]
Zhang, Jialin [3 ]
Meng, Xiangbing [4 ]
Chen, Yun [5 ]
机构
[1] Shandong Univ, Sch Mech Elect & Informat Engn, Jinan 250000, Peoples R China
[2] Peking Univ, Dept Informat Management, Beijing 100000, Peoples R China
[3] Paris Saclay Univ, Lab Rech Informat, F-91405 Paris, France
[4] Qufu Inst Tradit Chinese Med Hlth & Rehabil, Qufu 273100, Shandong, Peoples R China
[5] Shandong Univ TCM, Hosp 2, Jinan 250000, Peoples R China
关键词
PRECISION MEDICINE; CANCER; CLASSIFICATION; PROGRESSION; PROLIFERATION; SIGNATURE; HSPB8;
D O I
10.1155/2021/4312850
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Accurate screening on cancer biomarkers contributes to health assessment, drug screening, and targeted therapy for precision medicine. The rapid development of high-throughput sequencing technology has identified abundant genomic biomarkers, but most of them are limited to single-cancer analysis. Based on the combination of Fisher score, Recursive feature elimination, and Logistic regression (FRL), this paper proposes an integrative feature selection algorithm named FRL to explore potential cancer genomic biomarkers on cancer subsets. Fisher score is initially used to calculate the weights of genes to rapidly reduce the dimension. Recursive feature elimination and Logistic regression are then jointly employed to extract the optimal subset. Compared to the current differential expression analysis tool GEO2R based on the Limma algorithm, FRL has greater classification precision than Limma. Compared with five traditional feature selection algorithms, FRL exhibits excellent performance on accuracy (ACC) and F1-score and greatly improves computational efficiency. On high-noise datasets such as esophageal cancer, the ACC of FRL is 30% superior to the average ACC achieved with other traditional algorithms. As biomarkers found in multiple studies are more reliable and reproducible, and reveal stronger association on potential clinical value than single analysis, through literature review and spatial analyses of gene functional enrichment and functional pathways, we conduct cluster analysis on 10 diverse cancers with high mortality and form a potential biomarker module comprising 19 genes. All genes in this module can serve as potential biomarkers to provide more information on the overall oncogenesis mechanism for the detection of diverse early cancers and assist in targeted anticancer therapies for further developments in precision medicine.
引用
收藏
页数:16
相关论文
共 26 条
  • [21] Classification of Cancer Data Based on Support Vectors Machines with Feature Selection Using Genetic Algorithm and Laplacian Score
    Rustam, Z.
    Primasari, I.
    Widya, D.
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2017 (ISCPMS2017), 2018, 2023
  • [22] Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data
    Xu, Da
    Zhang, Jialin
    Xu, Hanxiao
    Zhang, Yusen
    Chen, Wei
    Gao, Rui
    Dehmer, Matthias
    BMC GENOMICS, 2020, 21 (01)
  • [23] Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
    Chaudhry, Muhammad Umar
    Yasir, Muhammad
    Asghar, Muhammad Nabeel
    Lee, Jee-Hyong
    ENTROPY, 2020, 22 (10) : 1 - 15
  • [24] A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection
    Lin, Xiaohui
    Wang, Quancai
    Yin, Peiyuan
    Tang, Liang
    Tan, Yexiong
    Li, Hong
    Yan, Kang
    Xu, Guowang
    METABOLOMICS, 2011, 7 (04) : 549 - 558
  • [25] A wrapper-based feature selection approach to investigate potential biomarkers for early detection of breast cancer
    Alnowami, Majdi R.
    Abolaban, Fouad A.
    Taha, Eslam
    JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2022, 15 (01) : 104 - 110
  • [26] Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data
    Da Xu
    Jialin Zhang
    Hanxiao Xu
    Yusen Zhang
    Wei Chen
    Rui Gao
    Matthias Dehmer
    BMC Genomics, 21