PENALIZED Q-LEARNING FOR DYNAMIC TREATMENT REGIMENS

被引:53
作者
Song, Rui [1 ]
Wang, Weiwei [2 ]
Zeng, Donglin [3 ]
Kosorok, Michael R. [3 ]
机构
[1] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[2] Two Sigma Investment, New York, NY 10012 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
Dynamic treatment regimen; individual selection; multi-stage; penalized Q-learning; Q-learning; shrinkage; two-stage procedure; 2-STAGE RANDOMIZATION DESIGNS; CLINICAL-TRIALS; SURVIVAL DISTRIBUTIONS; TREATMENT STRATEGIES; TREATMENT POLICIES; ORACLE PROPERTIES; LIKELIHOOD; INFERENCE; SELECTION; SUBJECT;
D O I
10.5705/ss.2012.364
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. It is illustrated with a depression clinical trial study.
引用
收藏
页码:901 / 920
页数:20
相关论文
共 26 条
[1]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[2]   Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m-Out-of-n Bootstrap Scheme [J].
Chakraborty, Bibhas ;
Laber, Eric B. ;
Zhao, Yingqi .
BIOMETRICS, 2013, 69 (03) :714-723
[3]   Inference for non-regular parameters in optimal dynamic treatment regimes [J].
Chakraborty, Bibhas ;
Murphy, Susan ;
Strecher, Victor .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2010, 19 (03) :317-343
[4]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[5]   Background and rationale for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study [J].
Fava, M ;
Rush, AJ ;
Trivedi, MH ;
Nierenberg, AA ;
Thase, ME ;
Sackeim, HA ;
Quitkin, FM ;
Wisniewski, S ;
Lavori, PW ;
Rosenbaum, JF ;
Kupfer, DJ .
PSYCHIATRIC CLINICS OF NORTH AMERICA, 2003, 26 (02) :457-+
[6]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[7]  
Hirano K., ECONOMETRIC IN PRESS
[8]  
Kaelbling P., 1996, J ARTIFICIAL INTELLI, V4, P237
[9]   A design for testing clinical strategies: biased adaptive within-subject randomization [J].
Lavori, PW ;
Dawson, R .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2000, 163 :29-38
[10]   Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials [J].
Lunceford, JK ;
Davidian, M ;
Tsiatis, AA .
BIOMETRICS, 2002, 58 (01) :48-57