Covariate dimension reduction for survival data via the Gaussian process latent variable model

被引:1
作者
Barrett, James E. [1 ]
Coolen, Anthony C. C. [1 ]
机构
[1] Kings Coll London, Inst Math & Mol Biomed, Hodgkin Bldg,Guys Campus, London SE1 1UL, England
关键词
dimensionality reduction; survival analysis; Gaussian process latent variable model; Weibull proportional hazards model;
D O I
10.1002/sim.6784
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The analysis of high-dimensional survival data is challenging, primarily owing to the problem of overfitting, which occurs when spurious relationships are inferred from data that subsequently fail to exist in test data. Here, we propose a novel method of extracting a low-dimensional representation of covariates in survival data by combining the popular Gaussian process latent variable model with a Weibull proportional hazards model. The combined model offers a flexible non-linear probabilistic method of detecting and extracting any intrinsic low-dimensional structure from high-dimensional data. By reducing the covariate dimension, we aim to diminish the risk of overfitting and increase the robustness and accuracy with which we infer relationships between covariates and survival outcomes. In addition, we can simultaneously combine information from multiple data sources by expressing multiple datasets in terms of the same low-dimensional space. We present results from several simulation studies that illustrate a reduction in overfitting and an increase in predictive performance, as well as successful detection of intrinsic dimensionality. We provide evidence that it is advantageous to combine dimensionality reduction with survival outcomes rather than performing unsupervised dimensionality reduction on its own. Finally, we use our model to analyse experimental gene expression data and detect and extract a low-dimensional representation that allows us to distinguish high-risk and low-risk groups with superior accuracy compared with doing regression on the original high-dimensional data. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:1340 / 1353
页数:14
相关论文
共 30 条
[1]  
[Anonymous], 2006, Advances in Neural Information Processing Systems
[2]  
[Anonymous], 2012, 29th International Conference on Machine Learning
[3]  
[Anonymous], 2007, Artificial Intelligence and Statistics
[4]  
[Anonymous], 2007, MACHINE LEARNING MUL, DOI DOI 10.1007/978-3-540-78155-4
[5]  
Chen M-H., 2001, Bayesian Survival Analysis
[6]  
COX DR, 1972, J R STAT SOC B, V34, P187
[7]   The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups [J].
Curtis, Christina ;
Shah, Sohrab P. ;
Chin, Suet-Feung ;
Turashvili, Gulisa ;
Rueda, Oscar M. ;
Dunning, Mark J. ;
Speed, Doug ;
Lynch, Andy G. ;
Samarajiwa, Shamith ;
Yuan, Yinyin ;
Graef, Stefan ;
Ha, Gavin ;
Haffari, Gholamreza ;
Bashashati, Ali ;
Russell, Roslin ;
McKinney, Steven ;
Langerod, Anita ;
Green, Andrew ;
Provenzano, Elena ;
Wishart, Gordon ;
Pinder, Sarah ;
Watson, Peter ;
Markowetz, Florian ;
Murphy, Leigh ;
Ellis, Ian ;
Purushotham, Arnie ;
Borresen-Dale, Anne-Lise ;
Brenton, James D. ;
Tavare, Simon ;
Caldas, Carlos ;
Aparicio, Samuel .
NATURE, 2012, 486 (7403) :346-352
[8]  
Ek CH, 2008, LECT NOTES COMPUT SC, V5237, P62, DOI 10.1007/978-3-540-85853-9_6
[9]  
Eleftheriadis S, 2013, LECT NOTES COMPUT SC, V8033, P527, DOI 10.1007/978-3-642-41914-0_52
[10]   Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies [J].
Engler, David ;
Li, Yi .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2009, 8 (01)