High-dimensional variable selection and prediction under competing risks with application to SEER-Medicare linked data

被引:11
作者
Hou, Jiayi [1 ]
Paravati, Anthony [2 ]
Hou, Jue [3 ]
Xu, Ronghui [3 ,4 ]
Murphy, James [2 ]
机构
[1] Univ Calif San Diego, Altman Clin & Translat Res Inst, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Radiat Med & Appl Sci, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Dept Mat, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Dept Family Med & Publ Hlth, La Jolla, CA 92093 USA
基金
美国国家卫生研究院;
关键词
boosting; cumulative incidence function; electronic medical record; LASSO; machine learning; precision medicine; PROPORTIONAL HAZARDS MODEL; CUMULATIVE INCIDENCE FUNCTION; ORACLE PROPERTIES; SURVIVAL ANALYSIS; ADAPTIVE LASSO; LINEAR-MODELS; REGRESSION; SUBDISTRIBUTION; LIKELIHOOD; TESTS;
D O I
10.1002/sim.7822
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Competing risk analysis considers event times due to multiple causes or of more than one event types. Commonly used regression models for such data include (1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously, and (2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use in the presence of high-dimensional predictors are largely unexplored. Motivated by an analysis using the linked SEER-Medicare database for the purposes of predicting cancer versus noncancer mortality for patients with prostate cancer, we study the accuracy of prediction and variable selection of existing machine learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method. We then apply the optimal approaches to the analysis of the SEER-Medicare data.
引用
收藏
页码:3486 / 3502
页数:17
相关论文
共 42 条
  • [1] [Anonymous], 2011, COUNTING PROCESSES S
  • [2] Simulating competing risks data in survival analysis
    Beyersmann, Jan
    Latouche, Aurelien
    Buchholz, Anika
    Schumacher, Martin
    [J]. STATISTICS IN MEDICINE, 2009, 28 (06) : 956 - 971
  • [3] Boosting for high-dimensional time-to-event data with competing risks
    Binder, Harald
    Allignol, Arthur
    Schumacher, Martin
    Beyersmann, Jan
    [J]. BIOINFORMATICS, 2009, 25 (07) : 890 - 896
  • [4] REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY
    Bradic, Jelena
    Fan, Jianqing
    Jiang, Jiancheng
    [J]. ANNALS OF STATISTICS, 2011, 39 (06) : 3092 - 3120
  • [5] COVARIANCE ANALYSIS OF CENSORED SURVIVAL DATA
    BRESLOW, N
    [J]. BIOMETRICS, 1974, 30 (01) : 89 - 99
  • [6] Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
  • [7] Boosting with the L2 loss:: Regression and classification
    Bühlmann, P
    Yu, B
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) : 324 - 339
  • [8] Boosting for high-dimensional linear models
    Buhlmann, Peter
    [J]. ANNALS OF STATISTICS, 2006, 34 (02) : 559 - 583
  • [9] Prediction of cumulative incidence function under the proportional hazards model
    Cheng, SC
    Fine, JP
    Wei, LJ
    [J]. BIOMETRICS, 1998, 54 (01) : 219 - 228
  • [10] Challenges of Big Data analysis
    Fan, Jianqing
    Han, Fang
    Liu, Han
    [J]. NATIONAL SCIENCE REVIEW, 2014, 1 (02) : 293 - 314