Integrating multi-omics data through deep learning for accurate cancer prognosis prediction

被引:102
作者
Chai, Hua [1 ]
Zhou, Xiang [1 ]
Zhang, Zhongyue [1 ]
Rao, Jiahua [1 ]
Zhao, Huiying [2 ]
Yang, Yuedong [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
[2] Sun Yat Sen Univ, Sun Yat Sen Mem Hosp, Guangzhou 510000, Peoples R China
[3] Sun Yat Sen Univ, Key Lab Machine Intelligence & Adv Comp MOE, Guangzhou 510000, Peoples R China
基金
中国国家自然科学基金;
关键词
Survival analysis; Multi-omics; Deep learning; Cancer prognosis; LYMPH-NODE METASTASIS; BREAST-CANCER; SURVIVAL; HETEROGENEITY; ASSOCIATION; ACTIVATION;
D O I
10.1016/j.compbiomed.2021.104481
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks. Results: By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review. Conclusion: Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.
引用
收藏
页数:8
相关论文
共 36 条
[11]   Tumour heterogeneity and resistance to cancer therapies [J].
Dagogo-Jack, Ibiayi ;
Shaw, Alice T. .
NATURE REVIEWS CLINICAL ONCOLOGY, 2018, 15 (02) :81-94
[12]   Specific expression of k63-linked ubiquitination of calmodulin-like protein 5 in breast cancer of premenopausal patients [J].
Debald, Manuel ;
Schildberg, Frank Alexander ;
Linke, Andrea ;
Walgenbach, Klaus ;
Kuhn, Walther ;
Hartmann, Gunther ;
Walgenbach-Bruenagel, Gisela .
JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY, 2013, 139 (12) :2125-2132
[13]   Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma [J].
Kim, Dokyoon ;
Li, Ruowang ;
Lucas, Anastasia ;
Verma, Shefali S. ;
Dudek, Scott M. ;
Ritchie, Marylyn D. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (03) :577-587
[14]   Machine learning applications in cancer prognosis and prediction [J].
Kourou, Konstantina ;
Exarchos, Themis P. ;
Exarchos, Konstantinos P. ;
Karamouzis, Michalis V. ;
Fotiadis, Dimitrios I. .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2015, 13 :8-17
[15]   Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication [J].
Lee, Tzong-Yi ;
Huang, Kai-Yao ;
Chuang, Cheng-Hsiang ;
Lee, Cheng-Yang ;
Chang, Tzu-Hao .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 87
[16]   Pharmacological activation of p53 triggers anticancer innate immune response through induction of ULBP2 [J].
Li, Hai ;
Lakshmikanth, Tadepally ;
Garofalo, Cinzia ;
Enge, Martin ;
Spinnler, Clemens ;
Anichini, Andrea ;
Szekely, Laszlo ;
Karre, Klas ;
Carbone, Ennio ;
Selivanova, Galina .
CELL CYCLE, 2011, 10 (19) :3346-3358
[17]   A review on machine learning principles for multi-view biological data integration [J].
Li, Yifeng ;
Wu, Fang-Xiang ;
Ngom, Alioune .
BRIEFINGS IN BIOINFORMATICS, 2018, 19 (02) :325-340
[18]   NPY1R is a novel peripheral blood marker predictive of metastasis and prognosis in breast cancer patients [J].
Liu, Lei ;
Xu, Qian ;
Cheng, Luyang ;
Ma, Chunhu ;
Xiao, Lijun ;
Xu, Dawei ;
Gao, Yaxian ;
Wang, Jianping ;
Song, Hongru .
ONCOLOGY LETTERS, 2015, 9 (02) :891-896
[19]   AKR1B10 overexpression in breast cancer: Association with tumor size, lymph node metastasis and patient survival and its potential as a novel serum marker [J].
Ma, Jun ;
Luo, Di-Xian ;
Huang, Chenfei ;
Shen, Yi ;
Bu, Yiwen ;
Markwell, Stephen ;
Gao, John ;
Liu, Jianghua ;
Zu, Xuyu ;
Cao, Zhe ;
Gao, Zachary ;
Lu, Fengmin ;
Liao, Duan-Fang ;
Cao, Deliang .
INTERNATIONAL JOURNAL OF CANCER, 2012, 131 (06) :E862-E871
[20]   Unsupervised multiple kernel learning for heterogeneous data integration [J].
Mariette, Jerome ;
Villa-Vialaneix, Nathalie .
BIOINFORMATICS, 2018, 34 (06) :1009-1015