Hierarchical learning from human preferences and curiosity

被引:3
作者
Bougie, Nicolas [1 ,2 ]
Ichise, Ryutaro [1 ,2 ]
机构
[1] Grad Univ Adv Studies Sokendai, Tokyo, Japan
[2] Natl Inst Informat, Tokyo, Japan
关键词
Hierarchical reinforcement learning; Preference-based learning; Curiosity; Human guidance; NEURAL-NETWORKS; EXPLORATION;
D O I
10.1007/s10489-021-02726-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent success in scaling deep reinforcement algorithms (DRL) to complex problems has been driven by well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally extremely sparse. One solution to this problem is to introduce human guidance to drive the agent's learning. Although low-level demonstrations is a promising approach, it was shown that such guidance may be difficult for experts to demonstrate since some tasks require a large amount of high-quality demonstrations. In this work, we explore human guidance in the form of high-level preferences between sub-goals, leading to drastic reductions in both human effort and cost of exploration. We design a novel hierarchical reinforcement learning method that introduces non-expert human preferences at the high-level, and curiosity to drastically speed up the convergence of subpolicies to reach any sub-goals. We further propose a strategy based on curiosity to automatically discover sub-goals. We evaluate the proposed method on 2D navigation tasks, robotic control tasks, and image-based video games (Atari 2600), which have high-dimensional observations, sparse rewards, and complex state dynamics. The experimental results show that the proposed method can learn significantly faster than traditional hierarchical RL methods and drastically reduces the amount of human effort required over standard imitation learning approaches.
引用
收藏
页码:7459 / 7479
页数:21
相关论文
共 68 条
[1]  
Abbeel P., 2004, P 21 INT C MACHINE L, DOI [10.1007/978-0-387-30164-8_417, DOI 10.1007/978-0-387-30164-8_417]
[2]  
Andreas J, 2017, PR MACH LEARN RES, V70
[3]  
Andrychowicz Marcin., 2017, Advances in Neural Information Processing Systems, P5048
[4]   A survey of robot learning from demonstration [J].
Argall, Brenna D. ;
Chernova, Sonia ;
Veloso, Manuela ;
Browning, Brett .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483
[5]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[6]   Active learning of inverse models with intrinsically motivated goal exploration in robots [J].
Baranes, Adrien ;
Oudeyer, Pierre-Yves .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (01) :49-73
[7]  
Bellemare MG, 2016, ADV NEUR IN, V29
[8]   The Arcade Learning Environment: An Evaluation Platform for General Agents [J].
Bellemare, Marc G. ;
Naddaf, Yavar ;
Veness, Joel ;
Bowling, Michael .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279
[9]   Exploration via Progress-Driven Intrinsic Rewards [J].
Bougie, Nicolas ;
Ichise, Ryutaro .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 :269-281
[10]   Fast and slow curiosity for high-level exploration in reinforcement learning [J].
Bougie, Nicolas ;
Ichise, Ryutaro .
APPLIED INTELLIGENCE, 2021, 51 (02) :1086-1107