Where's the Reward?: A Review of Reinforcement Learning for Instructional Sequencing

被引：40

作者：

Doroudi, Shayan ^{[1
,2
,3
]}

Aleven, Vincent ^{[4
]}

Brunskill, Emma ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Comp Sci Dept, Pittsburgh, PA 15213 USA

[2] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA

[3] Univ Calif Irvine, Sch Educ, Irvine, CA 92697 USA

[4] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION | 2019年 / 29卷 / 04期

关键词：

Reinforcement learning; Instructional sequencing; Adaptive instruction; History of artificial intelligence in education; TEACHING STRATEGIES; WORKED EXAMPLES; KNOWLEDGE; ALLOCATION; EXPERTISE; GAME; EFFICIENCY; RETENTION; SELECTION; IMPROVE;

D O I：

10.1007/s40593-019-00187-x

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Since the 1960s, researchers have been trying to optimize the sequencing of instructional activities using the tools of reinforcement learning (RL) and sequential decision making under uncertainty. Many researchers have realized that reinforcement learning provides a natural framework for optimal instructional sequencing given a particular model of student learning, and excitement towards this area of research is as alive now as it was over fifty years ago. But does RL actually help students learn? If so, when and where might we expect it to be most helpful? To help answer these questions, we review the variety of attempts to use RL for instructional sequencing. First, we present a historical narrative of this research area. We identify three waves of research, which gives us a sense of the various communities of researchers that have been interested in this problem and where the field is going. Second, we review all of the empirical research that has compared RL-induced instructional policies to baseline methods of sequencing. We find that over half of the studies found that RL-induced policies significantly outperform baselines. Moreover, we identify five clusters of studies with different characteristics and varying levels of success in using RL to help students learn. We find that reinforcement learning has been most successful in cases where it has been constrained with ideas and theories from cognitive psychology and the learning sciences. However, given that our theories and models are limited, we also find that it has been useful to complement this approach with running more robust offline analyses that do not rely heavily on the assumptions of one particular model. Given that many researchers are turning to deep reinforcement learning and big data to tackle instructional sequencing, we believe keeping these best practices in mind can help guide the way to the reward in using RL for instructional sequencing.

引用

页码：568 / 620

页数：53

共 50 条

[41] Reward Augmentation in Reinforcement Learning for Testing Distributed Systems
Borgarelli, Andrea
Enea, Constantin
Majumdar, Rupak
Nagendra, Srinidhi
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2024, 8 (OOPSLA2):
[42] Handling Stochastic Reward Delays in Machine Reinforcement Learning
Campbelll, Jeffrey S.
Givigi, Sidney N.
Schwartz, Howard M.
2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 314 - 319
[43] Reward Space Noise for Exploration in Deep Reinforcement Learning
Sun, Chuxiong
Wang, Rui
Li, Qian
Hu, Xiaohui
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
[44] Reinforcement online learning to rank with unbiased reward shaping
Shengyao Zhuang
Zhihao Qiao
Guido Zuccon
Information Retrieval Journal, 2022, 25 : 386 - 413
[45] Reward shaping in reinforcement learning for robotic hand manipulation
Deng, Zelin
Dong, Yunlong
Liu, Xing
Neurocomputing, 2025, 638
[46] Deep Reinforcement Learning for Video Summarization with Semantic Reward
Sun, Haoran
Zhu, Xiaolong
Zhou, Conghua
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 754 - 755
[47] Unsupervised reward engineering for reinforcement learning controlled manufacturing
Hirtz, Thomas
Tian, He
Yang, Yi
Ren, Tian-Ling
JOURNAL OF INTELLIGENT MANUFACTURING, 2024,
[48] Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Sejnova, Gabriela
Mejdrechova, Megi
Otahal, Marek
Sokovnin, Nikita
Farkas, Igor
Vavrecka, Michal
2021 7TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2021, : 326 - 331
[49] Reinforcement online learning to rank with unbiased reward shaping
Zhuang, Shengyao
Qiao, Zhihao
Zuccon, Guido
INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04): : 386 - 413
[50] Model-based average reward reinforcement learning
Tadepalli, P
Ok, D
ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224

← 1 2 3 4 5 →