Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引：0

作者：

Fang, Sikai ^{[1
]}

Lu, Xiaofeng ^{[1
,2
]}

Huang, Yifan ^{[1
]}

Sun, Guangling ^{[1
]}

Liu, Xuefeng ^{[1
]}

机构：

[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China

[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 25期

关键词：

Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;

D O I：

10.1007/s11042-024-18234-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.

引用

页码：67213 / 67229

页数：17

共 50 条

[41] Transformer enhanced by local perception self-attention for dynamic soft sensor modeling of industrial processes
Fang, Zeyu
Gao, Shiwei
Dang, Xiaochao
Dong, Xiaohui
Wang, Qiong
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
[42] UAV image object detection based on self-attention guidance and global feature fusion
Bai, Jing
Hu, Haiyang
Liu, Xiaojing
Zhuang, Shanna
Wang, Zhengyou
IMAGE AND VISION COMPUTING, 2024, 151
[43] Multiscale self-attention for unmanned ariel vehicle-based infrared thermal images detection
Ali, Muhammad Shahroze
Latif, Afshan
Anwar, Muhammad Waseem
Ashraf, Muhammad Hashir
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
[44] DiagSWin: A multi-scale vision transformer with diagonal-shaped windows for object detection and segmentation
Li, Ke
Wang, Di
Liu, Gang
Zhu, Wenxuan
Zhong, Haodi
Wang, Quan
NEURAL NETWORKS, 2024, 180
[45] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
Sima, Haifeng
Chen, Bailiang
Tang, Chaosheng
Zhang, Yudong
Sun, Junding
IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
[46] A fast self-attention cascaded network for object detection in large scene remote sensing images
Hua, Xia
Wang, Xinqing
Rui, Ting
Zhang, Haitao
Wang, Dong
APPLIED SOFT COMPUTING, 2020, 94 (94)
[47] Spiking Neural Networks for Object Detection Based on Integrating Neuronal Variants and Self-Attention Mechanisms
Li, Weixuan
Zhao, Jinxiu
Su, Li
Jiang, Na
Hu, Quan
APPLIED SCIENCES-BASEL, 2024, 14 (20):
[48] Presentation attack detection based on two-stream vision transformers with self-attention fusion
Peng, Fei
Meng, Shao-hua
Long, Min
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
[49] Multi-stage Transient Stability Assessment of Power System Based on Self-attention Transformer Encoder
Fang J.
Liu C.
Su C.
Lin H.
Zheng L.
Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2023, 43 (15): : 5745 - 5758
[50] An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention
Hong, Yong
Li, Deren
Luo, Shupei
Chen, Xin
Yang, Yi
Wang, Mi
REMOTE SENSING, 2022, 14 (24)

← 1 2 3 4 5 →