A smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes

被引:2
|
作者
Fan, Yanqin [1 ]
He, Ming [2 ]
Su, Liangjun [3 ]
Zhou, Xiao-Hua [4 ,5 ]
机构
[1] Univ Washington, Dept Econ, Seattle, WA 98195 USA
[2] Univ Technol Sydney, Econ Discipline Grp, Ultimo, Australia
[3] Singapore Management Univ, Sch Econ, Singapore, Singapore
[4] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Sch Publ Hlth, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
asymptotic normality; exceptional law; optimal smoothing parameter; sequential randomization; Wald-type inference; TECHNICAL CHALLENGES; INFERENCE;
D O I
10.1111/sjos.12359
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose a smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q-learning algorithm in which nonregular inference is involved, we show that, under assumptions adopted in this paper, the proposed smoothed Q-learning estimator is asymptotically normally distributed even when the Q-learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q-learning estimator is standard. We derive the optimal smoothing parameter and propose a data-driven method for estimating it. The finite sample properties of the smoothed Q-learning estimator are studied and compared with several existing estimators including the Q-learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness-Alzheimer's Disease (CATIE-AD) study.
引用
收藏
页码:446 / 469
页数:24
相关论文
共 33 条
  • [1] Q-learning for estimating optimal dynamic treatment rules from observational data
    Moodie, Erica E. M.
    Chakraborty, Bibhas
    Kramer, Michael S.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2012, 40 (04): : 629 - 645
  • [2] Q- and A-Learning Methods for Estimating Optimal Dynamic Treatment Regimes
    Schulte, Phillip J.
    Tsiatis, Anastasios A.
    Laber, Eric B.
    Davidian, Marie
    STATISTICAL SCIENCE, 2014, 29 (04) : 640 - 661
  • [3] Q-Learning in Dynamic Treatment Regimes With Misclassified Binary Outcome
    Liu, Dan
    He, Wenqing
    STATISTICS IN MEDICINE, 2024, 43 (30) : 5885 - 5897
  • [4] Accommodating misclassification effects on optimizing dynamic treatment regimes with Q-learning
    Charvadeh, Yasin Khadem
    Yi, Grace Y.
    STATISTICS IN MEDICINE, 2024, 43 (03) : 578 - 605
  • [5] New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes
    Zhao, Ying-Qi
    Zeng, Donglin
    Laber, Eric B.
    Kosorok, Michael R.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) : 583 - 598
  • [6] PENALIZED Q-LEARNING FOR DYNAMIC TREATMENT REGIMENS
    Song, Rui
    Wang, Weiwei
    Zeng, Donglin
    Kosorok, Michael R.
    STATISTICA SINICA, 2015, 25 (03) : 901 - 920
  • [7] Dynamic Treatment Regimes with Replicated Observations Available for Error-Prone Covariates: A Q-Learning Approach
    Liu, Dan
    He, Wenqing
    STATISTICS IN BIOSCIENCES, 2025,
  • [8] Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome
    Lyu, Lingyun
    Cheng, Yu
    Wahed, Abdus S. S.
    BIOMETRICS, 2023, 79 (04) : 3676 - 3689
  • [9] Estimating the quality of optimal treatment regimes
    Sies, Aniek
    Van Mechelen, Iven
    STATISTICS IN MEDICINE, 2019, 38 (25) : 4925 - 4938
  • [10] Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
    Zhang, Junzhe
    Bareinboim, Elias
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32