Exploring the Impact of Simple Explanations and Agency on Batch Deep Reinforcement Learning Induced Pedagogical Policies

被引:15
作者
Ausin, Markel Sanz [1 ]
Maniktala, Mehak [1 ]
Barnes, Tiffany [1 ]
Chi, Min [1 ]
机构
[1] North Carolina State Univ, Raleigh, NC 27695 USA
来源
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT I | 2020年 / 12163卷
基金
美国国家科学基金会;
关键词
Deep reinforcement learning; Pedagogical policy; Explanation; WORKED EXAMPLES; MOTIVATION; SYSTEM; TUTORS; LEVEL; TASK; HELP; GO;
D O I
10.1007/978-3-030-52237-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games from Atari, Mario, to StarCraft. However, little evidence has shown that DRL can be successfully applied to real-life human-centric tasks such as education or healthcare. Different from classic game-playing where the RL goal is to make an agent smart, in human-centric tasks the ultimate RL goal is to make the human-agent interactions productive and fruitful. Additionally, in many real-life human-centric tasks, data can be noisy and limited. As a sub-field of RL, batch RL is designed for handling situations where data is limited yet noisy, and building simulations is challenging. In two consecutive classroom studies, we investigated applying batch DRL to the task of pedagogical policy induction for an Intelligent Tutoring System (ITS), and empirically evaluated the effectiveness of induced pedagogical policies. In Fall 2018 (F18), the DRL policy is compared against an expert-designed baseline policy and in Spring 2019 (S19), we examined the impact of explaining the batch DRL-induced policy with student decisions and the expert baseline policy. Our results showed that 1) while no significant difference was found between the batch RL-induced policy and the expert policy in F18, the batch RLinduced policy with simple explanations significantly improved students' learning performance more than the expert policy alone in S19; and 2) no significant differences were found between the student decision making and the expert policy. Overall, our results suggest that pairing simple explanations with induced RL policies can be an important and effective technique for applying RL to real-life human-centric tasks.
引用
收藏
页码:472 / 485
页数:14
相关论文
共 57 条
[51]   Mastering the game of Go with deep neural networks and tree search [J].
Silver, David ;
Huang, Aja ;
Maddison, Chris J. ;
Guez, Arthur ;
Sifre, Laurent ;
van den Driessche, George ;
Schrittwieser, Julian ;
Antonoglou, Ioannis ;
Panneershelvam, Veda ;
Lanctot, Marc ;
Dieleman, Sander ;
Grewe, Dominik ;
Nham, John ;
Kalchbrenner, Nal ;
Sutskever, Ilya ;
Lillicrap, Timothy ;
Leach, Madeleine ;
Kavukcuoglu, Koray ;
Graepel, Thore ;
Hassabis, Demis .
NATURE, 2016, 529 (7587) :484-+
[52]  
Sweller J., 1985, Cognitive Load Theory, P59, DOI [DOI 10.1007/978-1-4419-8126-4, 10.1207/s1532690xci0201_3]
[53]  
Tsai Chun-Hua, 2019, IUI WORKSH
[54]  
van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094
[55]   When are tutorial dialogues more effective than reading? [J].
VanLehn, Kurt ;
Graesser, Arthur C. ;
Jackson, G. Tanner ;
Jordan, Pamela ;
Olney, Andrew ;
Rose, Carolyn P. .
COGNITIVE SCIENCE, 2007, 31 (01) :3-62
[56]   Grandmaster level in StarCraft II using multi-agent reinforcement learning [J].
Vinyals, Oriol ;
Babuschkin, Igor ;
Czarnecki, Wojciech M. ;
Mathieu, Michael ;
Dudzik, Andrew ;
Chung, Junyoung ;
Choi, David H. ;
Powell, Richard ;
Ewalds, Timo ;
Georgiev, Petko ;
Oh, Junhyuk ;
Horgan, Dan ;
Kroiss, Manuel ;
Danihelka, Ivo ;
Huang, Aja ;
Sifre, Laurent ;
Cai, Trevor ;
Agapiou, John P. ;
Jaderberg, Max ;
Vezhnevets, Alexander S. ;
Leblond, Remi ;
Pohlen, Tobias ;
Dalibard, Valentin ;
Budden, David ;
Sulsky, Yury ;
Molloy, James ;
Paine, Tom L. ;
Gulcehre, Caglar ;
Wang, Ziyu ;
Pfaff, Tobias ;
Wu, Yuhuai ;
Ring, Roman ;
Yogatama, Dani ;
Wunsch, Dario ;
McKinney, Katrina ;
Smith, Oliver ;
Schaul, Tom ;
Lillicrap, Timothy ;
Kavukcuoglu, Koray ;
Hassabis, Demis ;
Apps, Chris ;
Silver, David .
NATURE, 2019, 575 (7782) :350-+
[57]  
Wang PC, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3852