Robust Q-Learning

被引:18
作者
Ertefaie, Ashkan [1 ]
McKay, James R. [2 ]
Oslin, David [3 ,4 ,5 ]
Strawderman, Robert L. [1 ]
机构
[1] Univ Rochester, Dept Biostat & Computat Biol, 265 Crittenden Blvd,CU 420630, Rochester, NY 14642 USA
[2] Univ Penn, Dept Psychiat, Ctr Continuum Care Addict, Philadelphia, PA 19104 USA
[3] Univ Penn, Philadelphia Vet Adm Med Ctr, Philadelphia, PA 19104 USA
[4] Univ Penn, Treatment Res Ctr, Philadelphia, PA 19104 USA
[5] Univ Penn, Ctr Studies Addict, Dept Psychiat, Philadelphia, PA 19104 USA
关键词
Cross-fitting; Data-adaptive techniques; Dynamic treatment strategies; Residual confounding; DYNAMIC TREATMENT REGIMES; DESIGN; INFERENCE; STRATEGIES; SELECTION;
D O I
10.1080/01621459.2020.1753522
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy. Finite dimensional working models are typically used to estimate certain nuisance parameters, and misspecification of these working models can result in residual confounding and/or efficiency loss. We propose a robust Q-learning approach which allows estimating such nuisance parameters using data-adaptive techniques. We study the asymptotic behavior of our estimators and provide simulation studies that highlight the need for and usefulness of the proposed method in practice. We use the data from the "Extending Treatment Effectiveness of Naltrexone" multistage randomized trial to illustrate our proposed methods. Supplementary materials for this article are available online.
引用
收藏
页码:368 / 381
页数:14
相关论文
共 51 条
  • [1] [Anonymous], 2003, SPR S STAT
  • [2] Using the Standardized Difference to Compare the Prevalence of a Binary Variable Between Two Groups in Observational Research
    Austin, Peter C.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (06) : 1228 - 1234
  • [3] Doubly-Robust Estimators of Treatment-Specific Survival Distributions in Observational Studies with Stratified Sampling
    Bai, Xiaofei
    Tsiatis, Anastasios A.
    O'Brien, Sean M.
    [J]. BIOMETRICS, 2013, 69 (04) : 830 - 839
  • [4] Bembom O, 2007, STAT APPL GENET MOL, V6
  • [5] Doubly robust nonparametric inference on the average treatment effect
    Benkeser, D.
    Carone, M.
    van der Laan, M. J.
    Gilbert, P. B.
    [J]. BIOMETRIKA, 2017, 104 (04) : 863 - 880
  • [6] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Incorporating Patient Preferences into Estimation of Optimal Individualized Treatment Rules
    Butler, Emily L.
    Laber, Eric B.
    Davis, Sonia M.
    Kosorok, Michael R.
    [J]. BIOMETRICS, 2018, 74 (01) : 18 - 26
  • [9] Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
    Cao, Weihua
    Tsiatis, Anastasios A.
    Davidian, Marie
    [J]. BIOMETRIKA, 2009, 96 (03) : 723 - 734
  • [10] Chakraborty B., 2013, STAT METHODS DYNAMIC