Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 73 条
  • [1] Monocular Depth Estimation: A Thorough Review
    Arampatzakis, Vasileios
    Pavlidis, George
    Mitianoudis, Nikolaos
    Papamarkos, Nikos
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2396 - 2414
  • [2] RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (03) : 547 - 564
  • [3] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
    Bai, Xuyang
    Hu, Zeyu
    Zhu, Xinge
    Huang, Qingqiu
    Chen, Yilun
    Fu, Hangbo
    Tai, Chiew-Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1080 - 1089
  • [4] Bicchi A., 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), P348, DOI 10.1109/ROBOT.2000.844081
  • [5] Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes
    Chen, Siang
    Tang, Wei
    Xie, Pengwei
    Yang, Wenming
    Wang, Guijin
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 4895 - 4902
  • [6] FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
    Chen, Xuanyao
    Zhang, Tianyuan
    Wang, Yue
    Wang, Yilun
    Zhao, Hang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 172 - 181
  • [7] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [8] Real-World Multiobject, Multigrasp Detection
    Chu, Fu-Jen
    Xu, Ruinian
    Vela, Patricio A.
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3355 - 3362
  • [9] Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots
    Dinh-Cuong Hoang
    Stoyanov, Todor
    Lilienthal, Achim J.
    [J]. 2019 EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR), 2019,
  • [10] Context-Aware Grasp Generation in Cluttered Scenes
    Dinh-Cuong Hoang
    Stork, Johannes A.
    Stoyanov, Todor
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 1492 - 1498