A Smart Dual-modal Aligned Transformer Deep Network for Robotic Grasp Detection

被引：0

作者：

Cang, Xin ^{[1
]}

Zhang, Haojun ^{[1
]}

Yang, Yuequan ^{[1
]}

Cao, Zhiqiang ^{[2
]}

Li, Fudong ^{[1
]}

Zhu, Jiaming ^{[1
]}

机构：

[1] Yangzhou Univ, Sch Informat Engn, Yangzhou, Jiangsu, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

来源：

2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Dual modalities; Feature alignment; Robotic grasping; Transformer;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Robotic grasp is one of crucial visual tasks for service robots as well as industrial robots. The existing deep vision learning approaches for robotic grasp most utilize RGB-D as single modality or indiscriminating usage of them, which often overlook the valuable depth information in RGB-D images. To address this limitation, this paper proposes a smart dual-modal aligned transformer deep network (SATNet), which is not only very lightweight but also well performed for robotic grasping tasks using RGB-D images. Specifically, a novel ATFormer module with the two parallel aligned transformer encoder blocks are elaborated to fuse global feature maps efficiently. The experiments on Cornell dataset demonstrate that the proposed model outperforms existing methods, which not only enjoys impressively lightweight framework with only 0.27M parameters, but also achieves accuracy of 97.8% and inference time of 16.3ms.

引用

页码：1230 / 1235

页数：6

共 41 条

[11] Deep learning detection network for peripheral blood leukocytes based on improved detection transformer
Leng, Bing
Wang, Chunqing
Leng, Min
Ge, Mingfeng
Dong, Wenfei
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
[12] Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
Ribeiro, Eduardo Godinho
Mendes, Raul de Queiroz
Grassi Jr, Valdir
ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 139
[13] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
Neural Processing Letters, 2022, 54 : 1943 - 1960
[14] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Zhuang, Xuqiang
Liu, Fangai
Hou, Jian
Hao, Jianhua
Cai, Xiaohong
NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
[15] A Transformer-Optimized Deep Learning Network for Road Damage Detection and Tracking
Wang, Niannian
Shang, Lihang
Song, Xiaotian
SENSORS, 2023, 23 (17)
[16] Lightweight robotic grasping detection network based on dual attention and inverted residual
Yang, Yuequan
Li, Wei
Cao, Zhiqiang
Bao, Jiatong
Li, Fudong
TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2024, 46 (14) : 2687 - 2695
[17] FastGNet: an efficient 6-DOF grasp detection method with multi-attention mechanisms and point transformer network
Ding, Zichao
Wang, Aimin
Gao, Maosen
Li, Jiazhe
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (04)
[18] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
Zhang, Menghui
Zhang, Yuchen
Liu, Shuaibing
Han, Yahui
Cao, Honggang
Qiao, Bingbing
SCIENTIFIC REPORTS, 2024, 14 (01):
[19] Multimodal driver distraction detection using dual-channel network of CNN and Transformer
Mou, Luntian
Chang, Jiali
Zhou, Chao
Zhao, Yiyuan
Ma, Nan
Yin, Baocai
Jain, Ramesh
Gao, Wen
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
[20] A Transformer-Based Deep Learning Model for Sleep Apnea Detection and Application on RingConn Smart Ring
Wu, Zetong
Wu, Hao
Fang, Kaiqun
Sze, Keith Siu-Fung
Feng, Qianjin
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,

← 1 2 3 4 5 →