Multi-scale coupled attention for visual object detection

被引:2
|
作者
Li, Fei [1 ]
Yan, Hongping [2 ]
Shi, Linsu [1 ]
机构
[1] China Tower Corp Ltd, 9 Dongran North St, Beijing 100195, Peoples R China
[2] China Univ Geosci, Xueyuan Rd 29, Beijing 100083, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Attention mechanism; Deep neural networks; Object detection; Self-attention learning; Transformer; YOLO;
D O I
10.1038/s41598-024-60897-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and tuned finely to acquire better performance. This gears to the continuous demands on high performance in those complex scenes, where multi-scale objects to be detected are located here and there. To this end, this paper proposes a network structure called Multi-Scale Coupled Attention (MSCA) under the framework of self-attention learning with methodologies of importance assessment. Architecturally, it consists of a Multi-Scale Coupled Channel Attention (MSCCA) module, and a Multi-Scale Coupled Spatial Attention (MSCSA) module. Specifically, the MSCCA module is developed to achieve the goal of self-attention learning linearly on the multi-scale channels. In parallel, the MSCSA module is constructed to achieve this goal nonlinearly on the multi-scale spatial grids. The MSCCA and MSSCA modules can be connected together into a sequence, which can be used as a plugin to develop end-to-end learning models for object detection. Finally, our proposed network is compared on two public datasets with 13 classical or state-of-the-art models, including the Faster R-CNN, Cascade R-CNN, RetinaNet, SSD, PP-YOLO, YOLO v3, YOLO v5, YOLO v7, YOLOX, DETR, conditional DETR, UP-DETR and FP-DETR. Comparative experimental results with numerical scores, the ablation study, and the performance behaviour all demonstrate the effectiveness of our proposed model.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Remote Sensing Object Detection Method Based on Attention Mechanism and Multi-scale Feature Fusion
    Liu, Yang
    Xiao, Yewei
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7155 - 7160
  • [42] Object Detection of Remote Sensing Image Based on Multi-Scale Feature Fusion and Attention Mechanism
    Du, Zuoqiang
    Liang, Yuan
    IEEE ACCESS, 2024, 12 : 8619 - 8632
  • [43] Small object detection in unmanned aerial vehicle images using multi-scale hybrid attention
    Song, Gang
    Du, Hongwei
    Zhang, Xinyue
    Bao, Fangxun
    Zhang, Yunfeng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128
  • [44] Cascade multi-scale object detection on high-resolution images
    Novoselov, Alexey
    Dyakov, Oleg
    Kostromin, Igor
    Pogibelskiy, Dmitry
    2019 INTERNATIONAL CONFERENCE ON ENGINEERING AND TELECOMMUNICATION (ENT), 2019,
  • [45] UMS2-ODNet: Unified-scale domain adaptation mechanism driven object detection network with multi-scale attention
    Li, Yuze
    Zhang, Yan
    Yang, Chunling
    Chen, Yu
    NEURAL NETWORKS, 2025, 181
  • [46] Adaptive aerial object detection based on multi-scale deep learning
    Liu F.
    Han X.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2022, 43 (05):
  • [47] Rethinking the multi-scale feature hierarchy in object detection transformer (DETR)
    Liu, Fanglin
    Zheng, Qinghe
    Tian, Xinyu
    Shu, Feng
    Jiang, Weiwei
    Wang, Miaohui
    Elhanashi, Abdussalam
    Saponara, Sergio
    APPLIED SOFT COMPUTING, 2025, 175
  • [48] Multi-scale Semantic Information Fusion for Object Detection
    Chen Hongkun
    Luo Huilan
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (07) : 2087 - 2095
  • [49] Dynamic multi-scale loss optimization for object detection
    Luo, Yihao
    Cao, Xiang
    Zhang, Juntao
    Cheng, Peng
    Wang, Tianjiang
    Feng, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2349 - 2367
  • [50] MULTI-SCALE SHARED FEATURES FOR CASCADE OBJECT DETECTION
    Lin, Zhe
    Hua, Gang
    Davis, Larry S.
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1865 - 1868