When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection

被引:69
作者
Wang, Shaochen [1 ]
Zhou, Zhangli [1 ]
Kan, Zhen [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
Grasp detection; robotic grasping; vision transformer;
D O I
10.1109/LRA.2022.3187261
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In this letter, we present a transformer-based architecture, namely TF-Grasp, for robotic grasp detection. The developed TF-Grasp framework has two elaborate designs making it well suitable for visual grasping tasks. The first key design is that we adopt the local window attention to capture local contextual information and detailed features of graspable objects. Then, we apply the cross window attention to model the long-term dependencies between distant pixels. Object knowledge, environmental configuration, and relationships between different visual entities are aggregated for subsequent grasp detection. The second key design is that we build a hierarchical encoder-decoder architecture with skip-connections, delivering shallow features from the encoder to decoder to enable a multi-scale feature fusion. Due to the powerful attention mechanism, TF-Grasp can simultaneously obtain the local information (i.e., the contours of objects), and model long-term connections such as the relationships between distinct visual concepts in clutter. Extensive computational experiments demonstrate that TF-Grasp achieves competitive results versus state-of-art grasping convolutional models and attains a higher accuracy of 97.99 and 94.6% on Cornell and Jacquard grasping datasets, respectively. Real-world experiments using a 7DoF Franka Emika Panda robot also demonstrate its capability of grasping unseen objects in a variety of scenarios.
引用
收藏
页码:8170 / 8177
页数:8
相关论文
共 32 条
[1]   End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB [J].
Ainetter, Stefan ;
Fraundorfer, Friedrich .
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :13452-13458
[2]   RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests [J].
Asif, Umar ;
Bennamoun, Mohammed ;
Sohel, Ferdous A. .
IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (03) :547-564
[3]  
Asir U, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4875
[4]  
Bicchi A., 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), P348, DOI 10.1109/ROBOT.2000.844081
[5]  
Bradski G, 2000, DR DOBBS J, V25, P120
[6]  
Chen J., 2021, ArXiv, DOI [DOI 10.1038/S41592-020-01008-Z, DOI 10.1038/s41566-021-00828-5]
[7]   Real-World Multiobject, Multigrasp Detection [J].
Chu, Fu-Jen ;
Xu, Ruinian ;
Vela, Patricio A. .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :3355-3362
[8]  
Depierre A, 2018, IEEE INT C INT ROBOT, P3511, DOI 10.1109/IROS.2018.8593950
[9]  
Di Guo, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1609, DOI 10.1109/ICRA.2017.7989191
[10]  
Gariépy A, 2019, IEEE INT C INT ROBOT, P3996, DOI [10.1109/iros40897.2019.8967785, 10.1109/IROS40897.2019.8967785]