Integrating multi-omics data through deep learning for accurate cancer prognosis prediction

被引:86
|
作者
Chai, Hua [1 ]
Zhou, Xiang [1 ]
Zhang, Zhongyue [1 ]
Rao, Jiahua [1 ]
Zhao, Huiying [2 ]
Yang, Yuedong [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
[2] Sun Yat Sen Univ, Sun Yat Sen Mem Hosp, Guangzhou 510000, Peoples R China
[3] Sun Yat Sen Univ, Key Lab Machine Intelligence & Adv Comp MOE, Guangzhou 510000, Peoples R China
基金
中国国家自然科学基金;
关键词
Survival analysis; Multi-omics; Deep learning; Cancer prognosis; LYMPH-NODE METASTASIS; BREAST-CANCER; SURVIVAL; HETEROGENEITY; ASSOCIATION; ACTIVATION;
D O I
10.1016/j.compbiomed.2021.104481
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks. Results: By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review. Conclusion: Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data
    Albaradei, Somayah
    Napolitano, Francesco
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 4404 - 4411
  • [22] Deep learning and multi-omics approach to predict drug responses in cancer
    Conghao Wang
    Xintong Lye
    Rama Kaalia
    Parvin Kumar
    Jagath C. Rajapakse
    BMC Bioinformatics, 22
  • [23] SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition
    Sun, Qiuwen
    Cheng, Lei
    Meng, Ao
    Ge, Shuguang
    Chen, Jie
    Zhang, Longzhen
    Gong, Ping
    FRONTIERS IN GENETICS, 2023, 13
  • [24] A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data
    Xu, Jing
    Wu, Peng
    Chen, Yuehui
    Meng, Qingfang
    Dawood, Hussain
    Dawood, Hassan
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [25] Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis
    Li Tong
    Jonathan Mitchel
    Kevin Chatlin
    May D. Wang
    BMC Medical Informatics and Decision Making, 20
  • [26] Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis
    Tong, Li
    Mitchel, Jonathan
    Chatlin, Kevin
    Wang, May D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [27] Identification of Pan-Cancer Prognostic Biomarkers Through Integration of Multi-Omics Data
    Zhao, Ning
    Guo, Maozu
    Wang, Kuanquan
    Zhang, Chunlong
    Liu, Xiaoyan
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [28] Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data
    Tong, Danyang
    Tian, Yu
    Zhou, Tianshu
    Ye, Qiancheng
    Li, Jun
    Ding, Kefeng
    Li, Jingsong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [29] A machine learning and deep learning-based integrated multi-omics technique for leukemia prediction
    Abbasi, Erum Yousef
    Deng, Zhongliang
    Ali, Qasim
    Khan, Adil
    Shaikh, Asadullah
    Al Reshan, Mana Saleh
    Sulaiman, Adel
    Alshahrani, Hani
    HELIYON, 2024, 10 (03)
  • [30] Breast Cancer Risk Analysis Using Deep Learning on Multi-omics Data Combined with Epigenetic Factors
    Kumar, M. Gireesh
    Aparna, P.
    Gopakumar, G.
    INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2022, ICBHI 2022, 2024, 108 : 35 - 43