TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES

被引：34

作者：

Tao, Yebin ^{[1
]}

Wang, Lu ^{[1
]}

Almirall, Daniel ^{[2
]}

机构：

[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA

[2] Univ Michigan, Inst Social Res, Ann Arbor, MI 48109 USA

来源：

ANNALS OF APPLIED STATISTICS | 2018年 / 12卷 / 03期

基金：

美国国家卫生研究院;

关键词：

Multi-stage decision-making; personalized medicine; classification; backward induction; decision tree; TREATMENT STRATEGIES; TREATMENT RULES; OUTCOMES; ILLNESS; MODELS;

D O I：

10.1214/18-AOAS1137

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.

引用

页码：1914 / 1938

页数：25

共 50 条

[1] Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes
Song, Yao
Wang, Lu
BIOMETRICS, 2024, 80 (01)
[2] Estimating tree-based dynamic treatment regimes using observational data with restricted treatment sequences
Zhou, Nina
Wang, Lu
Almirall, Daniel
BIOMETRICS, 2023, 79 (03) : 2260 - 2271
[3] Comparing Four Methods for Estimating Tree-Based Treatment Regimes
Sies, Aniek
Van Mechelen, Iven
INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01):
[4] Stochastic Tree Search for Estimating Optimal Dynamic Treatment Regimes
Sun, Yilun
Wang, Lu
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 116 (533) : 421 - 432
[5] Tree-based reinforcement learning for optimal water reservoir operation
Castelletti, A.
Galelli, S.
Restelli, M.
Soncini-Sessa, R.
WATER RESOURCES RESEARCH, 2010, 46
[6] Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
Zhang, Junzhe
Bareinboim, Elias
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[7] Penalized Spline-Involved Tree-based (PenSIT) Learning for estimating an optimal dynamic treatment regime using observational data
Speth, Kelly A.
Elliott, Michael R.
Marquez, Juan L.
Wang, Lu
STATISTICAL METHODS IN MEDICAL RESEARCH, 2022, 31 (12) : 2338 - 2351
[8] Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data
Tang, Ming
Wang, Lu
Gorin, Michael A.
Taylor, Jeremy M. G.
STATISTICS IN MEDICINE, 2021, 40 (27) : 6164 - 6177
[9] Tree-Based Reinforcement Learning for Identifying Optimal Personalized Treatment Decisions for Hand Deformity in Rheumatoid Arthritis
Yoon, Alfred P.
Song, Yao
Lin, I-Chun F.
Wang, Lu
Chung, Kevin C.
PLASTIC AND RECONSTRUCTIVE SURGERY, 2024, 154 (06) : 1259 - 1266
[10] Tree-based methods for individualized treatment regimes
Laber, E. B.
Zhao, Y. Q.
BIOMETRIKA, 2015, 102 (03) : 501 - 514

← 1 2 3 4 5 →