Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications

被引:54
作者
Hou, Zhimin [1 ,2 ]
Fei, Jiajun [1 ,2 ]
Deng, Yuelin [1 ,2 ]
Xu, Jing [1 ,2 ]
机构
[1] Tsinghua Univ, Beijing Key Lab Precis Ultra Precis Mfg Equipment, State Key Lab Tribol, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Mech Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Data-efficiency; hierarchical reinforcement learning; robotic assembly control; IMPEDANCE CONTROL;
D O I
10.1109/TIE.2020.3038072
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical reinforcement learning (HRL) can learn the decomposed subpolicies corresponding to the local state-space; therefore, it is a promising solution to complex robotic assembly control tasks with fewer interactions with environments. Most existing HRL algorithms often require on-policy learning, where resampling is necessary for every training step. In this article, we propose a data-efficient HRL via off-policy learning with three main contributions. First, two augmented MDPs (Markov decision processes) are reformulated to learn the higher level policy and lower level policy from the same samples. Second, to learn higher level policy that leads to efficient exploration, a softmax gating policy is derived to determine the lower level policy for interacting with the environment. Third, to learn the lower level policies via off-policy samples from one lower level replay buffer, the higher level policy derived by the option-value network is adopted to select the appropriate option for learning the corresponding lower level policy. The data-efficiency performance of our algorithm is validated on two simulations and real-world robotic dual peg-in-hole assembly tasks.
引用
收藏
页码:11565 / 11575
页数:11
相关论文
共 41 条
[1]  
[Anonymous], 2019, WEBOTS R2019A
[2]  
Anschel O, 2017, PR MACH LEARN RES, P176
[3]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[4]  
Brockman G., 2016, TOOLKIT DEV COMP REI
[5]   Reinforcement learning for control: Performance, stability, and deep approximators [J].
Busoniu, Lucian ;
de Bruin, Tim ;
Tolic, Domagoj ;
Kober, Jens ;
Palunko, Ivana .
ANNUAL REVIEWS IN CONTROL, 2018, 46 :8-28
[6]  
Degris Thomas., 2012, INT C MACH LEARN ICM
[7]  
Fan JQ, 2020, PR MACH LEARN RES, V120, P486
[8]  
Fan YX, 2019, IEEE INT CONF ROBOT, P811, DOI [10.1109/ICRA.2019.8793659, 10.1109/icra.2019.8793659]
[9]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[10]  
Haarnoja T, 2018, PR MACH LEARN RES, V80