A Smart Dual-modal Aligned Transformer Deep Network for Robotic Grasp Detection

被引:0
|
作者
Cang, Xin [1 ]
Zhang, Haojun [1 ]
Yang, Yuequan [1 ]
Cao, Zhiqiang [2 ]
Li, Fudong [1 ]
Zhu, Jiaming [1 ]
机构
[1] Yangzhou Univ, Sch Informat Engn, Yangzhou, Jiangsu, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
来源
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Dual modalities; Feature alignment; Robotic grasping; Transformer;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robotic grasp is one of crucial visual tasks for service robots as well as industrial robots. The existing deep vision learning approaches for robotic grasp most utilize RGB-D as single modality or indiscriminating usage of them, which often overlook the valuable depth information in RGB-D images. To address this limitation, this paper proposes a smart dual-modal aligned transformer deep network (SATNet), which is not only very lightweight but also well performed for robotic grasping tasks using RGB-D images. Specifically, a novel ATFormer module with the two parallel aligned transformer encoder blocks are elaborated to fuse global feature maps efficiently. The experiments on Cornell dataset demonstrate that the proposed model outperforms existing methods, which not only enjoys impressively lightweight framework with only 0.27M parameters, but also achieves accuracy of 97.8% and inference time of 16.3ms.
引用
收藏
页码:1230 / 1235
页数:6
相关论文
共 41 条
  • [11] Deep learning detection network for peripheral blood leukocytes based on improved detection transformer
    Leng, Bing
    Wang, Chunqing
    Leng, Min
    Ge, Mingfeng
    Dong, Wenfei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
  • [12] Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
    Ribeiro, Eduardo Godinho
    Mendes, Raul de Queiroz
    Grassi Jr, Valdir
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 139
  • [13] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Xuqiang Zhuang
    Fangai Liu
    Jian Hou
    Jianhua Hao
    Xiaohong Cai
    Neural Processing Letters, 2022, 54 : 1943 - 1960
  • [14] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Zhuang, Xuqiang
    Liu, Fangai
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
  • [15] A Transformer-Optimized Deep Learning Network for Road Damage Detection and Tracking
    Wang, Niannian
    Shang, Lihang
    Song, Xiaotian
    SENSORS, 2023, 23 (17)
  • [16] Lightweight robotic grasping detection network based on dual attention and inverted residual
    Yang, Yuequan
    Li, Wei
    Cao, Zhiqiang
    Bao, Jiatong
    Li, Fudong
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2024, 46 (14) : 2687 - 2695
  • [17] FastGNet: an efficient 6-DOF grasp detection method with multi-attention mechanisms and point transformer network
    Ding, Zichao
    Wang, Aimin
    Gao, Maosen
    Li, Jiazhe
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (04)
  • [18] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
    Zhang, Menghui
    Zhang, Yuchen
    Liu, Shuaibing
    Han, Yahui
    Cao, Honggang
    Qiao, Bingbing
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [19] Multimodal driver distraction detection using dual-channel network of CNN and Transformer
    Mou, Luntian
    Chang, Jiali
    Zhou, Chao
    Zhao, Yiyuan
    Ma, Nan
    Yin, Baocai
    Jain, Ramesh
    Gao, Wen
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [20] A Transformer-Based Deep Learning Model for Sleep Apnea Detection and Application on RingConn Smart Ring
    Wu, Zetong
    Wu, Hao
    Fang, Kaiqun
    Sze, Keith Siu-Fung
    Feng, Qianjin
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,