Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

被引:18
|
作者
Laimighofer, Michael [1 ,2 ]
Krumsiek, Jan [1 ,3 ]
Buettner, Florian [1 ,4 ]
Theis, Fabian J. [1 ,2 ]
机构
[1] Helmholtz Zentrum Munchen, Inst Computat Biol, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany
[2] Tech Univ Munich, Dept Math, Garching, Germany
[3] German Ctr Diabet Res DZD, Munich, Germany
[4] European Mol Biol Lab Hinxton, European Bioinformat Inst, Cambridge, England
基金
英国医学研究理事会;
关键词
high-dimensional survival regression; feature selection; repeated nested cross validation; PENALIZED COX REGRESSION; BREAST-CANCER PATIENTS; EXPRESSION; MODEL; RISK;
D O I
10.1089/cmb.2015.0192
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.
引用
收藏
页码:279 / 290
页数:12
相关论文
共 50 条
  • [1] Preconditioning for feature selection and regression in high-dimensional problems'
    Paul, Debashis
    Bair, Eric
    Hastie, Trevor
    Tibshirani, Robert
    ANNALS OF STATISTICS, 2008, 36 (04): : 1595 - 1618
  • [2] Efficient Learning and Feature Selection in High-Dimensional Regression
    Ting, Jo-Anne
    D'Souza, Aaron
    Vijayakumar, Sethu
    Schaal, Stefan
    NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886
  • [3] Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection
    Linero, Antonio R.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (522) : 626 - 636
  • [4] Projective inference in high-dimensional problems: Prediction and feature selection
    Piironen, Juho
    Paasiniemi, Markus
    Vehtari, Aki
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 2155 - 2197
  • [5] Bayesian feature selection in high-dimensional regression in presence of correlated noise
    Feldman, Guy
    Bhadra, Anindya
    Kirshner, Sergey
    STAT, 2014, 3 (01): : 258 - 272
  • [6] Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (10) : 1609 - 1622
  • [7] Comparison of Unsupervised Feature Selection Methods for High-Dimensional Regression Problems in Prediction of Peptide Binding Affinity
    Sarac, Ferdi
    Uslan, Volkan
    Seker, Huseyin
    Bouridane, Ahmed
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 8169 - 8172
  • [8] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [9] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40
  • [10] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    M. L. Bermingham
    R. Pong-Wong
    A. Spiliopoulou
    C. Hayward
    I. Rudan
    H. Campbell
    A. F. Wright
    J. F. Wilson
    F. Agakov
    P. Navarro
    C. S. Haley
    Scientific Reports, 5