Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引:0
作者
Fang, Sikai [1 ]
Lu, Xiaofeng [1 ,2 ]
Huang, Yifan [1 ]
Sun, Guangling [1 ]
Liu, Xuefeng [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China
关键词
Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;
D O I
10.1007/s11042-024-18234-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.
引用
收藏
页码:67213 / 67229
页数:17
相关论文
共 50 条
  • [41] Transformer enhanced by local perception self-attention for dynamic soft sensor modeling of industrial processes
    Fang, Zeyu
    Gao, Shiwei
    Dang, Xiaochao
    Dong, Xiaohui
    Wang, Qiong
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [42] UAV image object detection based on self-attention guidance and global feature fusion
    Bai, Jing
    Hu, Haiyang
    Liu, Xiaojing
    Zhuang, Shanna
    Wang, Zhengyou
    IMAGE AND VISION COMPUTING, 2024, 151
  • [43] Multiscale self-attention for unmanned ariel vehicle-based infrared thermal images detection
    Ali, Muhammad Shahroze
    Latif, Afshan
    Anwar, Muhammad Waseem
    Ashraf, Muhammad Hashir
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [44] DiagSWin: A multi-scale vision transformer with diagonal-shaped windows for object detection and segmentation
    Li, Ke
    Wang, Di
    Liu, Gang
    Zhu, Wenxuan
    Zhong, Haodi
    Wang, Quan
    NEURAL NETWORKS, 2024, 180
  • [45] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
    Sima, Haifeng
    Chen, Bailiang
    Tang, Chaosheng
    Zhang, Yudong
    Sun, Junding
    IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
  • [46] A fast self-attention cascaded network for object detection in large scene remote sensing images
    Hua, Xia
    Wang, Xinqing
    Rui, Ting
    Zhang, Haitao
    Wang, Dong
    APPLIED SOFT COMPUTING, 2020, 94 (94)
  • [47] Spiking Neural Networks for Object Detection Based on Integrating Neuronal Variants and Self-Attention Mechanisms
    Li, Weixuan
    Zhao, Jinxiu
    Su, Li
    Jiang, Na
    Hu, Quan
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [48] Presentation attack detection based on two-stream vision transformers with self-attention fusion
    Peng, Fei
    Meng, Shao-hua
    Long, Min
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
  • [49] Multi-stage Transient Stability Assessment of Power System Based on Self-attention Transformer Encoder
    Fang J.
    Liu C.
    Su C.
    Lin H.
    Zheng L.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2023, 43 (15): : 5745 - 5758
  • [50] An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention
    Hong, Yong
    Li, Deren
    Luo, Shupei
    Chen, Xin
    Yang, Yi
    Wang, Mi
    REMOTE SENSING, 2022, 14 (24)