Learning high-level robotic manipulation actions with visual predictive model

被引：1

作者：

Ma, Anji ^{[1
]}

Chi, Guoyi ^{[2
]}

Ivaldi, Serena ^{[3
]}

Chen, Lipeng ^{[4
]}

机构：

[1] Beijing Inst Technol, Sch Mechatron Engn, Beijing, Peoples R China

[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

[3] Univ Lorraine, Inria, CNRS, Loria, F-54000 Nancy, France

[4] Univ Leeds, Sch Comp, Leeds, England

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2024年 / 10卷 / 01期

关键词：

Robot manipulation; Visual foresight; Visual perception; Deep learning; Grasp planning;

D O I：

10.1007/s40747-023-01174-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top-down thinking of high-level actions rather than bottom-up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks.

引用

页码：811 / 823

页数：13

共 44 条

[1]

[Anonymous], 2016, Data-Efficient Machine Learning workshop, ICML

[2]

Babaeizadeh M, 2018, 6 INT C LEARNING REP

[3] Video Imagination from a Single Image with Transformation Generation [J].

Chen, Baoyang ;

Wang, Wenmin ;

Wang, Jinzhuo .

PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :358-366

[4]

Dasari S., 2019, arXiv

[5]

Dasari S., 2019, ARXIV191011215

[6]

Deisenroth M.P., 2011, Found. Trends Robot., V2, P1

[7]

Deisenroth MP, 2014, IEEE INT CONF ROBOT, P3876, DOI 10.1109/ICRA.2014.6907421

[8]

Denton Emily, 2018, P MACHINE LEARNING R, V80

[9] Smart healthcare system-a brain-like computing approach for analyzing the performance of detectron2 and PoseNet models for anomalous action detection in aged people with movement impairments [J].

Divya, R. ;

Peter, J. Dinesh .

COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (04) :3021-3040

[10]

Ebert F., 2018, arXiv

← 1 2 3 4 5 →