Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

被引:0
作者
Chen, Yuanpei [1 ]
Wu, Tianhao [2 ]
Wang, Shengjie [1 ]
Feng, Xidong [1 ,4 ]
Jiang, Jiechuan [3 ]
McAleer, Stephen Marcus [5 ]
Dong, Hao [2 ]
Lu, Zongqing [1 ,3 ]
Zhu, Song-Chun [1 ,6 ]
Yang, Yaodong [1 ,6 ]
机构
[1] Peking Univ, Inst AI, Beijing, Peoples R China
[2] Peking Univ, Ctr Frontiers Comp Studies, Beijing, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[4] UCL, London, England
[5] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[6] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
关键词
GRASP;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Specifically, tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature. We built Bi-DexHands in the Issac Gym; this enables highly efficient RL training, reaching 30,000+ FPS by only one single NVIDIA RTX 3090. We provide a comprehensive benchmark for popular RL algorithms under different settings; this includes Single-agent/Multi-agent RL, Offline RL, Multi-task RL, and Meta RL. Our results show that the PPO type of on-policy algorithms can master simple manipulation tasks that are equivalent up to 48-month human babies (e.g., catching a flying object, opening a bottle), while multi-agent RL can further help to master manipulations that require skilled bimanual cooperation (e.g., lifting a pot, stacking blocks). Despite the success on each single task, when it comes to acquiring multiple manipulation skills, existing RL algorithms fail to work in most of the multi-task and the few-shot learning settings, which calls for more substantial development from the RL community. Our project is open sourced at https://github.com/PKU-MARL/DexterousHands.
引用
收藏
页数:14
相关论文
共 68 条
[1]  
Agrawal P, 2016, ADV NEUR IN, V29
[2]  
Akkaya I., 2019, CoRR
[3]  
Allshire A., 2021, NeurIPS Datasets and Benchmarks
[4]  
Antonova Rika, 2017, ARXIV
[5]  
Bayley N., 2006, BAYLEY SCALES INFANT
[6]   Learning and development in infant locomotion [J].
Berger, Sarah E. ;
Adolph, Karen E. .
FROM ACTION TO COGNITION, 2007, 164 :237-255
[7]  
Bicchi A., 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), P348, DOI 10.1109/ROBOT.2000.844081
[8]   Trends and challenges in robot manipulation [J].
Billard, Aude ;
Kragic, Danica .
SCIENCE, 2019, 364 (6446) :1149-+
[9]  
Bircher Walter G., 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P3453, DOI 10.1109/ICRA.2017.7989394
[10]   Data-Driven Grasp Synthesis-A Survey [J].
Bohg, Jeannette ;
Morales, Antonio ;
Asfour, Tamim ;
Kragic, Danica .
IEEE TRANSACTIONS ON ROBOTICS, 2014, 30 (02) :289-309