New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

被引:162
作者
Zhao, Ying-Qi [1 ]
Zeng, Donglin [2 ]
Laber, Eric B. [3 ]
Kosorok, Michael R. [4 ,5 ]
机构
[1] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53792 USA
[2] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[3] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[4] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[5] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
关键词
Classification; Personalized medicine; Q-learning; Reinforcement learning; Risk bound; Support vector machine; SUPPORT VECTOR MACHINES; INDIVIDUALIZED TREATMENT RULES; ADAPTIVE TREATMENT STRATEGIES; CLINICAL-TRIALS; INFERENCE; DESIGN; RANDOMIZATION; CLASSIFIERS; PERFORMANCE; DECISIONS;
D O I
10.1080/01621459.2014.937488
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation. Supplementary materials for this article are available online.
引用
收藏
页码:583 / 598
页数:16
相关论文
共 55 条
[1]  
[Anonymous], 1998, 15 INT C MACH LEARN
[2]  
[Anonymous], 1990, STAT SCI
[3]  
[Anonymous], THESIS KINGS COLL
[4]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[5]  
Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[6]   Statistical performance of support vector machines [J].
Blanchard, Gilles ;
Bousquet, Olivier ;
Massart, Pascal .
ANNALS OF STATISTICS, 2008, 36 (02) :489-531
[7]  
Blatt D., 2004, LEARNING APPRO UNPUB
[8]   Inference for non-regular parameters in optimal dynamic treatment regimes [J].
Chakraborty, Bibhas ;
Murphy, Susan ;
Strecher, Victor .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2010, 19 (03) :317-343
[9]   Adaptive selection of an incretin gene in Eurasian populations [J].
Chang, Chia Lin ;
Cai, James J. ;
Lo, Chiening ;
Amigo, Jorge ;
Park, Jae-Il ;
Hsu, Sheau Yu Teddy .
GENOME RESEARCH, 2011, 21 (01) :21-32
[10]  
Cook J. D., 2011, UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series