Q-Learning: Flexible Learning About Useful Utilities

被引:57
作者
Moodie E.E.M. [1 ]
Dean N. [2 ]
Sun Y.R. [3 ]
机构
[1] Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, QC
[2] School of Mathematics and Statistics, University of Glasgow, Glasgow, Scotland
[3] Department of Mathematics and Statistics, School of Computer Science, McGill University, Montreal, QC
基金
加拿大自然科学与工程研究理事会;
关键词
Adaptive treatment strategies; Discrete data; Dynamic treatment regimes; Generalized additive models; Personalized medicine; Q-learning;
D O I
10.1007/s12561-013-9103-z
中图分类号
学科分类号
摘要
Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes. © 2013, International Chinese Statistical Association.
引用
收藏
页码:223 / 243
页数:20
相关论文
共 33 条
  • [1] Chakraborty B., Dynamic treatment regimes for managing chronic health conditions: A statistical perspective, Am J Publ Health, 101, 1, pp. 40-45, (2011)
  • [2] Chakraborty B., Laber E.B., Zhao Y. , Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme, (2013)
  • [3] Chakraborty B., Moodie E.E.M., Estimating optimal dynamic treatment regimes with shared decision rules across stages: An extension of Q-learning, (2013)
  • [4] Chakraborty B., Murphy S.A., Strecher V., Inference for non-regular parameters in optimal dynamic treatment regimes, Stat Methods Med Res, 19, 3, pp. 317-343, (2010)
  • [5] Fava M., Rush A.J., Trivedi M.H., Nierenberg A.A., Thase M.E., Sackeim H.A., Quitkin F.M., Wisniewski S., Lavori P.W., Rosenbaum J.F., Kupfer D.J., Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study, Psychiatr Clin North Am, 26, 2, pp. 457-494, (2003)
  • [6] Golub G., Heath M., Wahba G., Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21, pp. 215-224, (1979)
  • [7] Hastie T., Tibshirani R., Generalized additive models, Stat Sci, 1, 3, pp. 297-318, (1986)
  • [8] Hastie T., Tibshirani R., Generalized additive models, (1990)
  • [9] Huang X., Ning J., Analysis of multi-stage treatments for recurrent diseases, Stat Med, 31, pp. 2805-2821, (2012)
  • [10] Li K.C., Asymptotic optimality of C <sub> p</sub>, C <sub> L</sub>, cross-validation and generalized cross-validation: Discrete index set, Ann Stat, 15, pp. 958-975, (1987)