New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

被引：162

作者：

Zhao, Ying-Qi ^{[1
]}

Zeng, Donglin ^{[2
]}

Laber, Eric B. ^{[3
]}

Kosorok, Michael R. ^{[4
,5
]}

机构：

[1] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53792 USA

[2] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA

[3] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA

[4] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA

[5] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2015年 / 110卷 / 510期

关键词：

Classification; Personalized medicine; Q-learning; Reinforcement learning; Risk bound; Support vector machine; SUPPORT VECTOR MACHINES; INDIVIDUALIZED TREATMENT RULES; ADAPTIVE TREATMENT STRATEGIES; CLINICAL-TRIALS; INFERENCE; DESIGN; RANDOMIZATION; CLASSIFIERS; PERFORMANCE; DECISIONS;

D O I：

10.1080/01621459.2014.937488

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation. Supplementary materials for this article are available online.

引用

页码：583 / 598

页数：16

共 55 条

[41]

Sutton R.S., 2017, Introduction to reinforcement learning

[42] Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring [J].

Thall, Peter F. ;

Wooten, Leiko H. ;

Logothetis, Christopher J. ;

Millikan, Randall E. ;

Tannir, Nizar M. .

STATISTICS IN MEDICINE, 2007, 26 (26) :4687-4702

[43]

Thall PF, 2000, STAT MED, V19, P1011, DOI 10.1002/(SICI)1097-0258(20000430)19:8<1011::AID-SIM414>3.0.CO

[44]

2-M

[45] Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials [J].

Thall, PF ;

Sung, HG ;

Estey, EH .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :29-39

[46]

Tsybakov AB, 2004, ANN STAT, V32, P135

[47] Improving chronic illness care: Translating evidence into action [J].

Wagner, EH ;

Austin, BT ;

Davis, C ;

Hindmarsh, M ;

Schaefer, J ;

Bonomi, A .

HEALTH AFFAIRS, 2001, 20 (06) :64-78

[48] Probability estimation for large-margin classifiers [J].

Wang, Junhui ;

Shen, Xiaotong ;

Liu, Yufeng .

BIOMETRIKA, 2008, 95 (01) :149-167

[49] Estimating optimal treatment regimes from a classification perspective [J].

Zhang, Baqun ;

Tsiatis, Anastasios A. ;

Davidian, Marie ;

Zhang, Min ;

Laber, Eric .

STAT, 2012, 1 (01) :103-114

[50] Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions [J].

Zhang, Baqun ;

Tsiatis, Anastasios A. ;

Laber, Eric B. ;

Davidian, Marie .

BIOMETRIKA, 2013, 100 (03) :681-694

← 1 2 3 4 5 6 →