A smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes

被引：2

作者：

Fan, Yanqin ^{[1
]}

He, Ming ^{[2
]}

Su, Liangjun ^{[3
]}

Zhou, Xiao-Hua ^{[4
,5
]}

机构：

[1] Univ Washington, Dept Econ, Seattle, WA 98195 USA

[2] Univ Technol Sydney, Econ Discipline Grp, Ultimo, Australia

[3] Singapore Management Univ, Sch Econ, Singapore, Singapore

[4] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China

[5] Peking Univ, Sch Publ Hlth, Beijing 100191, Peoples R China

来源：

SCANDINAVIAN JOURNAL OF STATISTICS | 2019年 / 46卷 / 02期

基金：

中国国家自然科学基金;

关键词：

asymptotic normality; exceptional law; optimal smoothing parameter; sequential randomization; Wald-type inference; TECHNICAL CHALLENGES; INFERENCE;

D O I：

10.1111/sjos.12359

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In this paper, we propose a smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q-learning algorithm in which nonregular inference is involved, we show that, under assumptions adopted in this paper, the proposed smoothed Q-learning estimator is asymptotically normally distributed even when the Q-learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q-learning estimator is standard. We derive the optimal smoothing parameter and propose a data-driven method for estimating it. The finite sample properties of the smoothed Q-learning estimator are studied and compared with several existing estimators including the Q-learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness-Alzheimer's Disease (CATIE-AD) study.

引用

页码：446 / 469

页数：24

共 33 条

[31] Semi-Supervised Off-Policy Reinforcement Learning and Value Estimation for Dynamic Treatment Regimes
Sonabend-W, Aaron
Laha, Nilanjana
Ananthakrishnan, Ashwin N.
Cai, Tianxi
Mukherjee, Rajarshi
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[32] Differentially private outcome-weighted learning for optimal dynamic treatment regime estimation
Spicker, Dylan
Moodie, Erica E. M.
Shortreed, Susan M.
STAT, 2024, 13 (01):
[33] Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data
Tang, Ming
Wang, Lu
Gorin, Michael A.
Taylor, Jeremy M. G.
STATISTICS IN MEDICINE, 2021, 40 (27) : 6164 - 6177

← 1 2 3 4 →