Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引:0
|
作者
Fang, Sikai [1 ]
Lu, Xiaofeng [1 ,2 ]
Huang, Yifan [1 ]
Sun, Guangling [1 ]
Liu, Xuefeng [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China
关键词
Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;
D O I
10.1007/s11042-024-18234-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.
引用
收藏
页码:67213 / 67229
页数:17
相关论文
共 50 条
  • [31] Speech enhancement method based on the multi-head self-attention mechanism
    Chang X.
    Zhang Y.
    Yang L.
    Kou J.
    Wang X.
    Xu D.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2020, 47 (01): : 104 - 110
  • [32] YOLO-MS: Multispectral Object Detection via Feature Interaction and Self-Attention Guided Fusion
    Xie, Yumin
    Zhang, Langwen
    Yu, Xiaoyuan
    Xie, Wei
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 2132 - 2143
  • [33] Multi-Type Self-Attention Guided Degraded Saliency Detection
    Zhou, Ziqi
    Wang, Zheng
    Lu, Huchuan
    Wang, Song
    Sun, Meijun
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13082 - 13089
  • [34] Vision Transformer With Enhanced Self-Attention for Few-Shot Ship Target Recognition in Complex Environments
    Tian, Yang
    Meng, Hao
    Yuan, Fei
    Ling, Yue
    Yuan, Ningze
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [35] 3D Object Detection Based on Voxel Self-Attention Auxiliary Networks
    Cao, Jie
    Peng, Yiqiang
    Fan, Likang
    Wang, Longfei
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (24)
  • [36] PPDTSA: Privacy-preserving Deep Transformation Self-attention Framework For Object Detection
    Ma, Bo
    Wu, Jinsong
    Lai, Edmund
    Hu, Shuolin
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [37] Small Object Detection in Remote Sensing Images Based on Window Self-Attention Mechanism
    Xu, Jiaxin
    Zhang, Qiao
    Liu, Yu
    Zheng, Mengting
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2023, 89 (08) : 489 - 497
  • [38] YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism
    Li, Ang
    Song, Xiangyu
    Sun, ShiJie
    Zhang, Zhaoyang
    Cai, Taotao
    Song, Huansheng
    WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 1 - 15
  • [39] Object Detection Model Based on Scene-Level Region Proposal Self-Attention
    Quan, Yu
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 954 - 961
  • [40] DPNET: DUAL-PATH NETWORK FOR EFFICIENT OBJECT DETECTION WITH LIGHTWEIGHT SELF-ATTENTION
    Shi, Huimin
    Zhou, Quan
    Ni, Yinghao
    Wu, Xiaofu
    Latecki, Longin Jan
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 771 - 775