A Smart Dual-modal Aligned Transformer Deep Network for Robotic Grasp Detection

被引:0
作者
Cang, Xin [1 ]
Zhang, Haojun [1 ]
Yang, Yuequan [1 ]
Cao, Zhiqiang [2 ]
Li, Fudong [1 ]
Zhu, Jiaming [1 ]
机构
[1] Yangzhou Univ, Sch Informat Engn, Yangzhou, Jiangsu, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
来源
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Dual modalities; Feature alignment; Robotic grasping; Transformer;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robotic grasp is one of crucial visual tasks for service robots as well as industrial robots. The existing deep vision learning approaches for robotic grasp most utilize RGB-D as single modality or indiscriminating usage of them, which often overlook the valuable depth information in RGB-D images. To address this limitation, this paper proposes a smart dual-modal aligned transformer deep network (SATNet), which is not only very lightweight but also well performed for robotic grasping tasks using RGB-D images. Specifically, a novel ATFormer module with the two parallel aligned transformer encoder blocks are elaborated to fuse global feature maps efficiently. The experiments on Cornell dataset demonstrate that the proposed model outperforms existing methods, which not only enjoys impressively lightweight framework with only 0.27M parameters, but also achieves accuracy of 97.8% and inference time of 16.3ms.
引用
收藏
页码:1230 / 1235
页数:6
相关论文
共 41 条
  • [21] Robotic Objects Detection and Grasping in Clutter Based on Cascaded Deep Convolutional Neural Network
    Liu, Dong
    Tao, Xiantong
    Yuan, Liheng
    Du, Yu
    Cong, Ming
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [22] Multimodal transformer graph convolution attention isomorphism network (MTCGAIN): a novel deep network for detection of insomnia disorder
    Wang, Yulong
    Ren, Yande
    Bi, Yuzhen
    Zhao, Feng
    Bai, Xingzhen
    Wei, Liangzhou
    Liu, Wanting
    Ma, Hancheng
    Bai, Peirui
    [J]. QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (05) : 3350 - 3365
  • [23] A transformer-based deep neural network for arrhythmia detection using continuous ECG signals
    Hu, Rui
    Chen, Jie
    Zhou, Li
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 144
  • [24] A Multi-Scale Transformer Fusion Deep Clustering Network for Unsupervised Planetary Change Detection
    Jia, Yutong
    Wan, Gang
    Liu, Jia
    Zhao, Chenxu
    Wang, Guoping
    Zhang, Yifan
    Liu, Lei
    Xie, Bin
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [25] A Dual Transformer-Based Deep Learning Model for Passenger Anomaly Behavior Detection in Elevator Cabs
    Ji, Yijin
    Sun, Haoxiang
    Xu, Benlian
    Lu, Mingli
    Zhou, Xu
    Shi, Jian
    [J]. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH, 2024, 15 (01)
  • [26] Computer-assisted diagnosis for axillary lymph node metastasis of early breast cancer based on transformer with dual-modal adaptive mid-term fusion using ultrasound elastography
    Gong, Chihao
    Wu, Yinglan
    Zhang, Guangyuan
    Liu, Xuan
    Zhu, Xiaoyao
    Cai, Nian
    Li, Jian
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2025, 119
  • [27] Transformer guidance dual-stream network for salient object detection in optical remote sensing images
    Zhang, Yi
    Guo, Jichang
    Yue, Huihui
    Yin, Xiangjun
    Zheng, Sida
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24) : 17733 - 17747
  • [28] Transformer guidance dual-stream network for salient object detection in optical remote sensing images
    Yi Zhang
    Jichang Guo
    Huihui Yue
    Xiangjun Yin
    Sida Zheng
    [J]. Neural Computing and Applications, 2023, 35 : 17733 - 17747
  • [29] Remote Sensing Image Change Detection Transformer Network Based on Dual-Feature Mixed Attention
    Song, Xinyang
    Hua, Zhen
    Li, Jinjiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [30] Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection
    Abraham S.E.
    Kovoor B.C.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (04) : 2341 - 2359