A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

被引:3
作者
Tec, Mauricio [1 ]
Duan, Yunshan [2 ]
Muller, Peter [2 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Cambridge, MA 02138 USA
[2] Univ Texas Austin, Dept Stat & Data Sci, Austin, TX 78712 USA
关键词
Bayesian methods; Experimental design; Reinforcement learning;
D O I
10.1080/00031305.2022.2129787
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.
引用
收藏
页码:223 / 233
页数:11
相关论文
共 40 条
[1]   Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting [J].
Atkinson, Craig ;
McCane, Brendan ;
Szymanski, Lech ;
Robins, Anthony .
NEUROCOMPUTING, 2021, 428 :291-307
[2]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[3]  
Berger J.O., 1985, STAT DECISION THEORY, Vsecond, DOI DOI 10.1007/978-1-4757-4286-2
[4]   ONE-SIDED SEQUENTIAL STOPPING BOUNDARIES FOR CLINICAL-TRIALS - A DECISION-THEORETIC APPROACH [J].
BERRY, DA ;
HO, CH .
BIOMETRICS, 1988, 44 (01) :219-227
[5]   A gridding method for Bayesian sequential decision problems [J].
Brockwell, AE ;
Kadane, JB .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (03) :566-584
[6]   Approaches for optimal sequential decision analysis in clinical trials [J].
Carlin, BP ;
Kadane, JB ;
Gelfand, AE .
BIOMETRICS, 1998, 54 (03) :964-975
[7]  
Cassandra A., 1997, Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, P54
[8]   Sequential stopping rules for species accumulation [J].
Christen, JA ;
Nakamura, M .
JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2003, 8 (02) :184-195
[9]   Q-Learning: Theory and Applications [J].
Clifton, Jesse ;
Laber, Eric .
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 :279-301
[10]  
DeGroot MorrisH., 2004, Optimal statistical decisions