Q-learning for estimating optimal dynamic treatment rules from observational data

被引:55
作者
Moodie, Erica E. M. [1 ]
Chakraborty, Bibhas [2 ]
Kramer, Michael S. [1 ,3 ]
机构
[1] McGill Univ, Dept Epidemiol Biostat & Occupat Hlth, Montreal, PQ H3A 1A2, Canada
[2] Columbia Univ, Dept Biostat, New York, NY 10032 USA
[3] McGill Univ, Dept Pediat, Montreal, PQ H3H 1P3, Canada
来源
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE | 2012年 / 40卷 / 04期
基金
加拿大自然科学与工程研究理事会; 美国国家卫生研究院; 加拿大健康研究院;
关键词
Bias; confounding; dynamic treatment regime; inverse probability of treatment weighting; non-regularity; propensity scores; COGNITIVE-DEVELOPMENT; TREATMENT REGIMES; PROPENSITY SCORE; INFANT GROWTH; INFERENCE;
D O I
10.1002/cjs.11162
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision-making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q-learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q-learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non-regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial. The Canadian Journal of Statistics 40: 629645; 2012 (C) 2012 Statistical Society of Canada
引用
收藏
页码:629 / 645
页数:17
相关论文
共 35 条
  • [1] Anderson JW, 1999, AM J CLIN NUTR, V70, P525
  • [2] [Anonymous], 1996, Neuro-dynamic programming
  • [3] Optimal Dynamic Regimes: Presenting a Case for Predictive Inference
    Arjas, Elja
    Saarela, Olli
    [J]. INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2010, 6 (02):
  • [4] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
  • [5] Chakraborty B., 2012, INFERENCE OPTI UNPUB
  • [6] Chakraborty B., 2012, ESTIMATING OPT UNPUB
  • [7] Dynamic Treatment Regimes for Managing Chronic Health Conditions: A Statistical Perspective
    Chakraborty, Bibhas
    [J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2011, 101 (01) : 40 - 45
  • [8] Inference for non-regular parameters in optimal dynamic treatment regimes
    Chakraborty, Bibhas
    Murphy, Susan
    Strecher, Victor
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2010, 19 (03) : 317 - 343
  • [9] Regret-Regression for Optimal Dynamic Treatment Regimes
    Henderson, Robin
    Ansell, Phil
    Alshibani, Deyadeen
    [J]. BIOMETRICS, 2010, 66 (04) : 1192 - 1201
  • [10] Comparison of dynamic treatment regimes via inverse probability weighting
    Hernán, MA
    Lanoy, E
    Costagliola, D
    Robins, JM
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2006, 98 (03) : 237 - 242