Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior

被引:8
|
作者
Lee, Kyu Ha [1 ,2 ]
Chakraborty, Sounak [3 ]
Sun, Jianguo [3 ]
机构
[1] Forsyth Inst, Epidemiol & Biostat Core, Cambridge, MA USA
[2] Harvard Sch Dent Med, Dept Oral Hlth Policy & Epidemiol, Boston, MA USA
[3] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
基金
美国国家科学基金会;
关键词
Accelerated failure time model; Bayesian lasso; Gibbs sampler; Group lasso; Penalized regression; FAILURE TIME MODEL; MICROARRAY DATA; SURVIVAL ANALYSIS; HAZARD RATIOS; ELASTIC NET; COX MODEL; REGRESSION; PREDICTION; SHRINKAGE;
D O I
10.1016/j.csda.2017.02.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The variable selection problem is discussed in the context of high-dimensional failure time data arising from the accelerated failure time model. A data augmentation approach is employed in order to deal with censored survival times and to facilitate prior-posterior conjugacy. To identify a set of grouped relevant covariates, a shrinkage prior distribution is specified for regression coefficients mimicking the effect of group lasso penalty. It is noted that unlike the corresponding frequentist method, a Bayesian penalized regression approach cannot shrink the estimates of coefficients to exact zeros in general. Towards resolving the issue, a two-stage thresholding method that exploits the scaled neighbor-hood criterion and the Bayesian information criterion is devised. Simulation studies are performed to assess the robustness and performance of the proposed method in terms of variable selection accuracy and predictive power. The method is successfully applied to a set of microarray data on the individuals diagnosed with diffuse large B-cell lymphoma. In addition, an R package called psbcGroup, which can be downloaded freely from CRAN, is developed for the implementation of the methods. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [31] SPARSE COVARIANCE THRESHOLDING FOR HIGH-DIMENSIONAL VARIABLE SELECTION
    Jeng, X. Jessie
    Daye, Z. John
    STATISTICA SINICA, 2011, 21 (02) : 625 - 657
  • [32] Comparison of Lasso Type Estimators for High-Dimensional Data
    Kim, Jaehee
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2014, 21 (04) : 349 - 361
  • [33] Pathway Lasso: pathway estimation and selection with high-dimensional mediators
    Zhao, Yi
    Luo, Xi
    STATISTICS AND ITS INTERFACE, 2022, 15 (01) : 39 - 50
  • [34] The sparsity and bias of the lasso selection in high-dimensional linear regression
    Zhang, Cun-Hui
    Huang, Jian
    ANNALS OF STATISTICS, 2008, 36 (04) : 1567 - 1594
  • [35] An ensemble learning method for variable selection: application to high-dimensional data and missing values
    Bar-Hen, Avner
    Audigier, Vincent
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (16) : 3488 - 3510
  • [36] Variable selection techniques after multiple imputation in high-dimensional data
    Zahid, Faisal Maqbool
    Faisal, Shahla
    Heumann, Christian
    STATISTICAL METHODS AND APPLICATIONS, 2020, 29 (03) : 553 - 580
  • [37] Integrative analysis and variable selection with multiple high-dimensional data sets
    Ma, Shuangge
    Huang, Jian
    Song, Xiao
    BIOSTATISTICS, 2011, 12 (04) : 763 - 775
  • [38] Bayesian adaptive lasso with variational Bayes for variable selection in high-dimensional generalized linear mixed models
    Dao Thanh Tung
    Minh-Ngoc Tran
    Tran Manh Cuong
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (02) : 530 - 543
  • [39] A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model
    Lavanya, K.
    Reddy, L. S. S.
    Reddy, B. Eswara
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, 2019, 711 : 19 - 30
  • [40] Variable selection in high-dimensional quantile varying coefficient models
    Tang, Yanlin
    Song, Xinyuan
    Wang, Huixia Judy
    Zhu, Zhongyi
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 122 : 115 - 132