Integrating multi-omics data through deep learning for accurate cancer prognosis prediction
被引:102
作者:
Chai, Hua
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Chai, Hua
[1
]
Zhou, Xiang
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Zhou, Xiang
[1
]
Zhang, Zhongyue
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Zhang, Zhongyue
[1
]
Rao, Jiahua
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Rao, Jiahua
[1
]
Zhao, Huiying
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sun Yat Sen Mem Hosp, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Zhao, Huiying
[2
]
Yang, Yuedong
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Sun Yat Sen Univ, Key Lab Machine Intelligence & Adv Comp MOE, Guangzhou 510000, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
Yang, Yuedong
[1
,3
]
机构:
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
[2] Sun Yat Sen Univ, Sun Yat Sen Mem Hosp, Guangzhou 510000, Peoples R China
[3] Sun Yat Sen Univ, Key Lab Machine Intelligence & Adv Comp MOE, Guangzhou 510000, Peoples R China
Survival analysis;
Multi-omics;
Deep learning;
Cancer prognosis;
LYMPH-NODE METASTASIS;
BREAST-CANCER;
SURVIVAL;
HETEROGENEITY;
ASSOCIATION;
ACTIVATION;
D O I:
10.1016/j.compbiomed.2021.104481
中图分类号:
Q [生物科学];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Background: Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks. Results: By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review. Conclusion: Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Stanford Univ, Dept Pediat, Div Syst Med, Stanford, CA 94305 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Aran, Dvir
;
Sirota, Marina
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Sirota, Marina
;
Butte, Atul J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
机构:
US EPA, Gulf Ecol Div, Natl Hlth & Environm Effects Res Lab, Gulf Breeze, FL 32561 USA
Southern Calif Coastal Water Res Project, 3535 Harbor Blvd Suite 110, Costa Mesa, CA 92626 USAVisvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, Nagpur, Maharashtra, India
Beck, Marcus W.
;
Martinez Alvarez, Francisco
论文数: 0引用数: 0
h-index: 0
机构:
Univ Pablo de Olavide, Dept Comp Sci, Seville, SpainVisvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, Nagpur, Maharashtra, India
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Chaudharyl, Kumardeep
;
Poirionl, Olivier B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Poirionl, Olivier B.
;
Lu, Liangqun
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Univ Hawaii Manoa, Mol Biosci & Bioengn Grad Program, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Lu, Liangqun
;
Garmire, Lana X.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Univ Hawaii Manoa, Mol Biosci & Bioengn Grad Program, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Stanford Univ, Dept Pediat, Div Syst Med, Stanford, CA 94305 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Aran, Dvir
;
Sirota, Marina
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
Sirota, Marina
;
Butte, Atul J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USAUniv Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94158 USA
机构:
US EPA, Gulf Ecol Div, Natl Hlth & Environm Effects Res Lab, Gulf Breeze, FL 32561 USA
Southern Calif Coastal Water Res Project, 3535 Harbor Blvd Suite 110, Costa Mesa, CA 92626 USAVisvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, Nagpur, Maharashtra, India
Beck, Marcus W.
;
Martinez Alvarez, Francisco
论文数: 0引用数: 0
h-index: 0
机构:
Univ Pablo de Olavide, Dept Comp Sci, Seville, SpainVisvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, Nagpur, Maharashtra, India
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Chaudharyl, Kumardeep
;
Poirionl, Olivier B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Poirionl, Olivier B.
;
Lu, Liangqun
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Univ Hawaii Manoa, Mol Biosci & Bioengn Grad Program, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Lu, Liangqun
;
Garmire, Lana X.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA
Univ Hawaii Manoa, Mol Biosci & Bioengn Grad Program, Honolulu, HI 96822 USAUniv Hawaii, Canc Ctr, Program Epidemiol, Honolulu, HI 96822 USA