Learning high-level robotic manipulation actions with visual predictive model

被引:1
作者
Ma, Anji [1 ]
Chi, Guoyi [2 ]
Ivaldi, Serena [3 ]
Chen, Lipeng [4 ]
机构
[1] Beijing Inst Technol, Sch Mechatron Engn, Beijing, Peoples R China
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[3] Univ Lorraine, Inria, CNRS, Loria, F-54000 Nancy, France
[4] Univ Leeds, Sch Comp, Leeds, England
关键词
Robot manipulation; Visual foresight; Visual perception; Deep learning; Grasp planning;
D O I
10.1007/s40747-023-01174-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top-down thinking of high-level actions rather than bottom-up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks.
引用
收藏
页码:811 / 823
页数:13
相关论文
共 44 条
[1]  
[Anonymous], 2016, Data-Efficient Machine Learning workshop, ICML
[2]  
Babaeizadeh M, 2018, 6 INT C LEARNING REP
[3]   Video Imagination from a Single Image with Transformation Generation [J].
Chen, Baoyang ;
Wang, Wenmin ;
Wang, Jinzhuo .
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :358-366
[4]  
Dasari S., 2019, arXiv
[5]  
Dasari S., 2019, ARXIV191011215
[6]  
Deisenroth M.P., 2011, Found. Trends Robot., V2, P1
[7]  
Deisenroth MP, 2014, IEEE INT CONF ROBOT, P3876, DOI 10.1109/ICRA.2014.6907421
[8]  
Denton Emily, 2018, P MACHINE LEARNING R, V80
[9]   Smart healthcare system-a brain-like computing approach for analyzing the performance of detectron2 and PoseNet models for anomalous action detection in aged people with movement impairments [J].
Divya, R. ;
Peter, J. Dinesh .
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (04) :3021-3040
[10]  
Ebert F., 2018, arXiv