Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome

被引:5
作者
Lyu, Lingyun [1 ]
Cheng, Yu [1 ,2 ]
Wahed, Abdus S. S. [3 ]
机构
[1] Univ Pittsburgh, Dept Biostat, Pittsburgh, PA USA
[2] Univ Pittsburgh, Dept Stat, Pittsburgh, PA USA
[3] Univ Rochester, Dept Biostat & Computat Biol, Rochester, NY 14642 USA
关键词
Cox proportional hazard model; hot-deck multiple imputation; optimal dynamic treatment regime; precision medicine; propensity score; INFERENCE; MODELS;
D O I
10.1111/biom.13872
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Q-learning has been one of the most commonly used methods for optimizing dynamic treatment regimes (DTRs) in multistage decision-making. Right-censored survival outcome poses a significant challenge to Q-Learning due to its reliance on parametric models for counterfactual estimation which are subject to misspecification and sensitive to missing covariates. In this paper, we propose an imputation-based Q-learning (IQ-learning) where flexible nonparametric or semiparametric models are employed to estimate optimal treatment rules for each stage and then weighted hot-deck multiple imputation (MI) and direct-draw MI are used to predict optimal potential survival times. Missing data are handled using inverse probability weighting and MI, and the nonrandom treatment assignment among the observed is accounted for using a propensity-score approach. We investigate the performance of IQ-learning via extensive simulations and show that it is more robust to model misspecification than existing Q-Learning methods, imputes only plausible potential survival times contrary to parametric models and provides more flexibility in terms of baseline hazard shape. Using IQ-learning, we developed an optimal DTR for leukemia treatment based on a randomized trial with observational follow-up that motivated this study.
引用
收藏
页码:3676 / 3689
页数:14
相关论文
共 33 条
  • [1] A Review of Hot Deck Imputation for Survey Non-response
    Andridge, Rebecca R.
    Little, Roderick J. A.
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) : 40 - 64
  • [2] A comparison of multiple imputation and doubly robust estimation for analyses with missing data
    Carpenter, James R.
    Kenward, Michael G.
    Vansteelandt, Stijn
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2006, 169 : 571 - 584
  • [3] Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring
    Cho, Hunyong
    Holloway, Shannon T.
    Couper, David J.
    Kosorok, Michael R.
    [J]. BIOMETRIKA, 2022, : 395 - 410
  • [4] COX DR, 1972, J R STAT SOC B, V34, P187
  • [5] Nonparametric inverse-probability-weighted estimators based on the highly adaptive lasso
    Ertefaie, Ashkan
    Hejazi, Nima S.
    van der Laan, Mark J.
    [J]. BIOMETRICS, 2023, 79 (02) : 1029 - 1041
  • [6] Randomized phase II study of fludarabine plus cytosine arabinoside plus idarubicin ± all-trans retinoic acid ± granulocyte colony-stimulating factor in poor prognosis newly diagnosed acute myeloid leukemia and myelodysplastic syndrome
    Estey, EH
    Thall, PF
    Pierce, S
    Cortes, J
    Beran, M
    Kantarjian, H
    Keating, MJ
    Andreeff, M
    Freireich, E
    [J]. BLOOD, 1999, 93 (08) : 2478 - 2484
  • [7] Gill R. D., 1997, Proceedings of the First Seattle Symposium in Biostatistics, P255, DOI DOI 10.1007/978-1-4684-6316-3_14
  • [8] Q-LEARNING WITH CENSORED DATA
    Goldberg, Yair
    Kosorok, Michael R.
    [J]. ANNALS OF STATISTICS, 2012, 40 (01) : 529 - 560
  • [9] Optimal two-stage dynamic treatment regimes from a classification perspective with censored survival data
    Hager, Rebecca
    Tsiatis, Anastasios A.
    Davidian, Marie
    [J]. BIOMETRICS, 2018, 74 (04) : 1180 - 1192
  • [10] Structural accelerated failure time models for survival analysis in studies with time-varying treatments
    Hernán, MA
    Cole, SR
    Margolick, J
    Cohen, M
    Robins, JM
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2005, 14 (07) : 477 - 491