Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引：0

作者：

Fang, Sikai ^{[1
]}

Lu, Xiaofeng ^{[1
,2
]}

Huang, Yifan ^{[1
]}

Sun, Guangling ^{[1
]}

Liu, Xuefeng ^{[1
]}

机构：

[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China

[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 25期

关键词：

Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;

D O I：

10.1007/s11042-024-18234-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.

引用

页码：67213 / 67229

页数：17

共 50 条

[21] Enhanced Multiscale Vision Transformer with Cascaded Feature Fusion for Efficient Object Detection in Remote Sensing Images
Zhang, Xiangyu
Wang, Cui
Yang, Xiande
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025,
[22] Multi-Camera 3D Object Detection for Autonomous Driving Using Deep Learning and Self-Attention Mechanism
Hazarika, Ananya
Vyas, Amit
Rahmati, Mehdi
Wang, Yan
IEEE ACCESS, 2023, 11 : 64608 - 64620
[23] YOLOv3 object detection method by introducing Gaussian mask self-attention module
Kong Ya-jie
Zhang Ye
CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2022, 37 (04) : 539 - 548
[24] CAG-FPN: CHANNEL SELF-ATTENTION GUIDED FEATURE PYRAMID NETWORK FOR OBJECT DETECTION
Chang, Jie
Dai, Huhe
Zheng, Yuan
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 9616 - 9620
[25] Channel Self-Attention Based Multiscale Spatial-Frequency Domain Network for Oriented Object Detection in Remote Sensing Imagery
Xu, Yang
Pan, Yushan
Wu, Zebin
Wei, Zhihui
Zhan, Tianming
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[26] CWCT: An Effective Vision Transformer using improved Cross-Window Self-Attention and CNN
Li, Mengxing
Song, Ying
Wang, Bo
2022 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS (VRW 2022), 2022, : 140 - 145
[27] Progressive Domain Adaptive Object Detection Based on Self-Attention in Foggy Weather
Lin, Meng
Zhou, Gang
Yang, Yawei
Shi, Jun
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2023, 18 (12) : 1923 - 1931
[28] EPNet with Self-Attention for Fast and Accurate 3D Object Detection
Sakai, Yuto
Nishikawa, Hiroki
Kong, Xiangbo
Tomiyama, Hiroyuki
2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
[29] Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection
Li, Hao
Zhang, Zehan
Zhao, Xian
Wang, Yulong
Shen, Yuxi
Pu, Shiliang
Mao, Hui
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 532 - 549
[30] Tea Disease Detection Method with Multi-scale Self-attention Feature Fusion
Sun Y.
Wu F.
Yao J.
Zhou Q.
Shen J.
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (12): : 309 - 315

← 1 2 3 4 5 →