Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

被引:17
|
作者
Zhang, Zishuang [1 ]
Liu, Zhi-Ping [1 ,2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Dept Biomed Engn, Jinan 250061, Shandong, Peoples R China
[2] Shandong Univ, Ctr Intelligent Med, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomarker discovery; Omics data; Feature selection; Akaike information criterion; Hepatocellular carcinoma; IDENTIFICATION; DISEASES;
D O I
10.1186/s12920-021-00957-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Development of a High-Throughput Molecular Imaging-Based Orthotopic Hepatocellular Carcinoma Model
    Hwang, Gloria L.
    van den Bosch, Maurice A.
    Kim, Young I.
    Katzenberg, Regina
    Willmann, Juergen K.
    Paulmurugan, Ramasamy
    Gambhir, Sanjiv S.
    Hofmann, Lawrence
    CUREUS, 2015, 7 (06):
  • [42] Quantitative proteomic analysis for high-throughput screening of differential glycoproteins in hepatocellular carcinoma serum
    Gao, Hua-Jun
    Chen, Ya-Jing
    Zuo, Duo
    Xiao, Ming-Ming
    Li, Ying
    Guo, Hua
    Zhang, Ning
    Chen, Rui-Bing
    CANCER BIOLOGY & MEDICINE, 2015, 12 (03) : 246 - 254
  • [43] From biomarker discovery to combined therapies: Advancing hepatocellular carcinoma treatment strategies
    Kong, Mo-Wei
    Yu, Yang
    Wan, Ying
    Gao, Yu
    Zhang, Chun-Xiang
    WORLD JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2024, 16 (11)
  • [44] Deciphering hepatocellular carcinoma through metabolomics: from biomarker discovery to therapy evaluation
    Guo, Wei
    Tan, Hor Yue
    Wang, Ning
    Wang, Xuanbin
    Feng, Yibin
    CANCER MANAGEMENT AND RESEARCH, 2018, 10 : 715 - 734
  • [45] Multiple “Omics” data-based biomarker screening for hepatocellular carcinoma diagnosis
    Xiao-Na Liu
    Dan-Ni Cui
    Yu-Fang Li
    Yun-He Liu
    Gang Liu
    Lei Liu
    World Journal of Gastroenterology, 2019, (30) : 4199 - 4212
  • [46] Multiple "Omics" data-based biomarker screening for hepatocellular carcinoma diagnosis
    Liu, Xiao-Na
    Cui, Dan-Ni
    Li, Yu-Fang
    Liu, Yun-He
    Liu, Gang
    Liu, Lei
    WORLD JOURNAL OF GASTROENTEROLOGY, 2019, 25 (30) : 4199 - 4212
  • [47] High-throughput circular RNA sequencing reveals the profiles of circular RNA in non-cirrhotic hepatocellular carcinoma
    Li, Hongyu
    Xu, Liangliang
    Yi, Pengsheng
    Li, Lian
    Yan, Tao
    Xie, Liang
    Zhu, Zhijun
    BMC CANCER, 2022, 22 (01)
  • [48] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [49] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [50] Wrapper- and Ensemble-Based Feature Subset Selection Methods for Biomarker Discovery in Targeted Metabolomics
    Franken, Holger
    Lehmann, Rainer
    Haering, Hans-Ulrich
    Fritsche, Andreas
    Stefan, Norbert
    Zell, Andreas
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 121 - +