Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data

被引:133
作者
Ma, Baoshan [1 ]
Meng, Fanyu [1 ]
Yan, Ge [1 ]
Yan, Haowen [1 ]
Chai, Bingjie [1 ]
Song, Fengju [2 ]
机构
[1] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116026, Peoples R China
[2] Tianjin Med Univ Canc Inst & Hosp, Natl Clin Res Ctr Canc, Key Lab Mol Canc Epidemiol, Dept Epidemiol & Biostat, Tianjin 300060, Peoples R China
基金
中国国家自然科学基金;
关键词
Diagnostic classification; Machine learning; Extreme gradient boosting; Multi-omics data; Cancer; SQUAMOUS-CELL CARCINOMA; EXPRESSION; GENE; HEAD; PREDICTION;
D O I
10.1016/j.compbiomed.2020.103761
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate diagnostic classification of cancers can greatly help physicians to choose surveillance and treatment strategies for patients. Following the explosive growth of huge amounts of biological data, the shift from traditional biostatistical methods to computer-aided means has made machine-learning methods as an integral part of today's cancer prognosis prediction. In this work, we proposed a classification model by leveraging the power of extreme gradient boosting (XGBoost) and using increasingly complex multi-omits data with the aim to separate early stage and late stage cancers. We applied XGBoost model to four kinds of cancer data downloaded from TCGA and compared its performance with other popular machine-learning methods. The experimental results showed that our method obtained statistically significantly better or comparable predictive performance. The results of this study also revealed that DNA methylation outperforms other molecular data (mRNA expression and miRNA expression) in terms of accuracy and stability for discriminating between early stage and late stage groups. Furthermore, integration of multi-omits data by autoencoder can enhance the classification accuracy of cancer stage. Finally, we conducted bioinformatics analyses to assess the medical utility of the significant genes ranked by their importance using XGBoost algorithm. Extensively comparative experiments demonstrated that the XGBoost method has a remarkable performance in predicting the stage of cancer patients with multi-omits data. Moreover, identification of novel candidate genes associated with cancer stages would contribute to further elucidate disease pathogenesis and develop novel therapeutics.
引用
收藏
页数:10
相关论文
共 58 条
[1]  
Abolhasani Maryam, 2015, Asian Pac J Cancer Prev, V16, P5043
[2]   Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer [J].
Alkhateeb, Abedalrhman ;
Rezaeian, Iman ;
Singireddy, Siva ;
Cavallo-Medved, Dora ;
Porter, Lisa A. ;
Rueda, Luis .
CANCER INFORMATICS, 2019, 18
[3]  
[Anonymous], IEEE ACM T COMPUT BI
[4]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[5]   Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer [J].
Bhalla, Sherry ;
Chaudhary, Kumardeep ;
Kumar, Ritesh ;
Sehgal, Manika ;
Kaur, Harpreet ;
Sharma, Suresh ;
Raghava, Gajendra P. S. .
SCIENTIFIC REPORTS, 2017, 7
[6]  
Bray F, 2018, CA-CANCER J CLIN, V68, P394, DOI [10.3322/caac.21492, 10.3322/caac.21609]
[7]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5
[8]  
Buguliskis J.S., 2015, CLIN OMICS, V2, P12
[9]   Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer [J].
Chaudharyl, Kumardeep ;
Poirionl, Olivier B. ;
Lu, Liangqun ;
Garmire, Lana X. .
CLINICAL CANCER RESEARCH, 2018, 24 (06) :1248-1259
[10]   A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis [J].
Chen, Hui-Ling ;
Yang, Bo ;
Liu, Jie ;
Liu, Da-You .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :9014-9022