Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 73 条
  • [11] Sub-OBB based object recognition and localization algorithm using range images
    Dinh-Cuong Hoang
    Liang-Chia Chen
    Thanh-Hung Nguyen
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2017, 28 (02)
  • [12] Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review
    Du, Guoguang
    Wang, Kai
    Lian, Shiguo
    Zhao, Kaiyong
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1677 - 1734
  • [13] Eigen D, 2014, ADV NEUR IN, V27
  • [14] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658
  • [15] GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping
    Fang, Hao-Shu
    Wang, Chenxi
    Gou, Minghao
    Lu, Cewu
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11441 - 11450
  • [16] RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images
    Gou, Minghao
    Fang, Hao-Shu
    Zhu, Zhanda
    Xu, Sheng
    Wang, Chenxi
    Lu, Cewu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13459 - 13466
  • [17] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [18] He YS, 2020, PROC CVPR IEEE, P11629, DOI 10.1109/CVPR42600.2020.01165
  • [19] Graspability-Aware Object Pose Estimation in Cluttered Scenes
    Hoang, Dinh-Cuong
    Nguyen, Anh-Nhat
    Vu, Van-Duc
    Nguyen, Thu-Uyen
    Vu, Duy-Quang
    Ngo, Phuc-Quan
    Hoang, Ngoc-Anh
    Phan, Khanh-Toan
    Tran, Duc-Thanh
    Nguyen, Van-Thiep
    Duong, Quang-Tri
    Ho, Ngoc-Trung
    Tran, Cong-Trinh
    Duong, Van-Hiep
    Mai, Anh-Truong
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3124 - 3130
  • [20] Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism
    Hoang, Dinh-Cuong
    Nguyen, Anh-Nhat
    Vu, Van-Duc
    Vu, Duy-Quang
    Nguyen, Van-Thiep
    Nguyen, Thu-Uyen
    Tran, Cong-Trinh
    Phan, Khanh-Toan
    Ho, Ngoc-Trung
    [J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2023, 109 (03)