Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

被引：17

作者：

Zhang, Zishuang ^{[1
]}

Liu, Zhi-Ping ^{[1
,2
]}

机构：

[1] Shandong Univ, Sch Control Sci & Engn, Dept Biomed Engn, Jinan 250061, Shandong, Peoples R China

[2] Shandong Univ, Ctr Intelligent Med, Jinan 250061, Shandong, Peoples R China

来源：

BMC MEDICAL GENOMICS | 2021年 / 14卷 / SUPPL 1期

基金：

中国国家自然科学基金;

关键词：

Biomarker discovery; Omics data; Feature selection; Akaike information criterion; Hepatocellular carcinoma; IDENTIFICATION; DISEASES;

D O I：

10.1186/s12920-021-00957-4

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.

引用

页数：12

共 50 条

[31] Mass Spectrometry in High-Throughput Clinical Biomarker Assays: Multiple Reaction Monitoring
Parker, Carol E.
Domanski, Dominik
Percy, Andrew J.
Chambers, Andrew G.
Camenzind, Alexander G.
Smith, Derek S.
Borchers, Christoph H.
CHEMICAL DIAGNOSTICS: FROM BENCH TO BEDSIDE, 2014, 336 : 117 - 137
[32] A critical assessment of the feature selection methods used for biomarker discovery in current metaproteomics studies
Tang, Jing
Wang, Yunxia
Fu, Jianbo
Zhou, Ying
Luo, Yongchao
Zhang, Ying
Li, Bo
Yang, Qingxia
Xue, Weiwei
Lou, Yan
Qiu, Yunqing
Zhu, Feng
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1378 - 1390
[33] Exploring high-throughput biomolecular data with multiobjective robust continuous clustering
Wang, Yunhe
Wong, Ka-Chun
Li, Xiangtao
INFORMATION SCIENCES, 2022, 583 : 239 - 265
[34] Tissue microarray for high-throughput analysis of gene expression profiles in hepatocellular carcinoma
Liu, Kai
Lei, Xue-Zhong
Zhao, Lian-San
Tang, Hong
Liu, Li
Feng, Ping
Lei, Bing-Jun
WORLD JOURNAL OF GASTROENTEROLOGY, 2005, 11 (09) : 1369 - 1372
[35] Tissue microarray for high-throughput analysis of gene expression profiles in hepatocellular carcinoma
Kai Liu
World Journal of Gastroenterology, 2005, (09) : 1369 - 1372
[36] Robust twin boosting for feature selection from high-dimensional omics data with label noise
He, Shan
Chen, Huanhuan
Zhu, Zexuan
Ward, Douglas G.
Cooper, Helen J.
Viant, Mark R.
Heath, John K.
Yao, Xin
INFORMATION SCIENCES, 2015, 291 : 1 - 18
[37] Upcoming challenges for multiple sequence alignment methods in the high-throughput era
Kemena, Carsten
Notredame, Cedric
BIOINFORMATICS, 2009, 25 (19) : 2455 - 2465
[38] Development of High-Throughput Mass Spectrometry-Based Approaches for Cancer Biomarker Discovery and Implementation
Hood, Brian L.
Stewart, Nicolas A. S.
Conrads, Thomas P.
CLINICS IN LABORATORY MEDICINE, 2009, 29 (01) : 115 - +
[39] CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
Mohammed, Akram
Biegert, Greyson
Adamec, Jiri
Helikar, Tomas
ONCOTARGET, 2018, 9 (02): : 2565 - 2573
[40] LDA enhanced one-bit compressive sensing method for high-throughput mass spectrometry data feature selection
Bian, Xuechun
Chen, Xiaofang
Xu, Wenbo
Wang, Yue
DIGITAL SIGNAL PROCESSING, 2023, 140

← 1 2 3 4 5 →