Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引:0
|
作者
Fang, Sikai [1 ]
Lu, Xiaofeng [1 ,2 ]
Huang, Yifan [1 ]
Sun, Guangling [1 ]
Liu, Xuefeng [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China
关键词
Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;
D O I
10.1007/s11042-024-18234-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.
引用
收藏
页码:67213 / 67229
页数:17
相关论文
共 50 条
  • [21] Enhanced Multiscale Vision Transformer with Cascaded Feature Fusion for Efficient Object Detection in Remote Sensing Images
    Zhang, Xiangyu
    Wang, Cui
    Yang, Xiande
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025,
  • [22] Multi-Camera 3D Object Detection for Autonomous Driving Using Deep Learning and Self-Attention Mechanism
    Hazarika, Ananya
    Vyas, Amit
    Rahmati, Mehdi
    Wang, Yan
    IEEE ACCESS, 2023, 11 : 64608 - 64620
  • [23] YOLOv3 object detection method by introducing Gaussian mask self-attention module
    Kong Ya-jie
    Zhang Ye
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2022, 37 (04) : 539 - 548
  • [24] CAG-FPN: CHANNEL SELF-ATTENTION GUIDED FEATURE PYRAMID NETWORK FOR OBJECT DETECTION
    Chang, Jie
    Dai, Huhe
    Zheng, Yuan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 9616 - 9620
  • [25] Channel Self-Attention Based Multiscale Spatial-Frequency Domain Network for Oriented Object Detection in Remote Sensing Imagery
    Xu, Yang
    Pan, Yushan
    Wu, Zebin
    Wei, Zhihui
    Zhan, Tianming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [26] CWCT: An Effective Vision Transformer using improved Cross-Window Self-Attention and CNN
    Li, Mengxing
    Song, Ying
    Wang, Bo
    2022 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS (VRW 2022), 2022, : 140 - 145
  • [27] Progressive Domain Adaptive Object Detection Based on Self-Attention in Foggy Weather
    Lin, Meng
    Zhou, Gang
    Yang, Yawei
    Shi, Jun
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2023, 18 (12) : 1923 - 1931
  • [28] EPNet with Self-Attention for Fast and Accurate 3D Object Detection
    Sakai, Yuto
    Nishikawa, Hiroki
    Kong, Xiangbo
    Tomiyama, Hiroyuki
    2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
  • [29] Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection
    Li, Hao
    Zhang, Zehan
    Zhao, Xian
    Wang, Yulong
    Shen, Yuxi
    Pu, Shiliang
    Mao, Hui
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 532 - 549
  • [30] Tea Disease Detection Method with Multi-scale Self-attention Feature Fusion
    Sun Y.
    Wu F.
    Yao J.
    Zhou Q.
    Shen J.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (12): : 309 - 315