Multi-scale coupled attention for visual object detection

被引:2
|
作者
Li, Fei [1 ]
Yan, Hongping [2 ]
Shi, Linsu [1 ]
机构
[1] China Tower Corp Ltd, 9 Dongran North St, Beijing 100195, Peoples R China
[2] China Univ Geosci, Xueyuan Rd 29, Beijing 100083, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Attention mechanism; Deep neural networks; Object detection; Self-attention learning; Transformer; YOLO;
D O I
10.1038/s41598-024-60897-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and tuned finely to acquire better performance. This gears to the continuous demands on high performance in those complex scenes, where multi-scale objects to be detected are located here and there. To this end, this paper proposes a network structure called Multi-Scale Coupled Attention (MSCA) under the framework of self-attention learning with methodologies of importance assessment. Architecturally, it consists of a Multi-Scale Coupled Channel Attention (MSCCA) module, and a Multi-Scale Coupled Spatial Attention (MSCSA) module. Specifically, the MSCCA module is developed to achieve the goal of self-attention learning linearly on the multi-scale channels. In parallel, the MSCSA module is constructed to achieve this goal nonlinearly on the multi-scale spatial grids. The MSCCA and MSSCA modules can be connected together into a sequence, which can be used as a plugin to develop end-to-end learning models for object detection. Finally, our proposed network is compared on two public datasets with 13 classical or state-of-the-art models, including the Faster R-CNN, Cascade R-CNN, RetinaNet, SSD, PP-YOLO, YOLO v3, YOLO v5, YOLO v7, YOLOX, DETR, conditional DETR, UP-DETR and FP-DETR. Comparative experimental results with numerical scores, the ablation study, and the performance behaviour all demonstrate the effectiveness of our proposed model.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Video Salient Object Detection Using Multi-Scale Self-Attention
    Liu, Jiahao (jiahao.liu@akane.waseda.jp), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [22] Cross-Layer Feature Attention Module for Multi-scale Object Detection
    Zheng, Haotian
    Pang, Cheng
    Lan, Rushi
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT II, 2022, 1701 : 202 - 210
  • [23] Small and Dense Commodity Object Detection with Multi-Scale Receptive Field Attention
    Ji, Zhong
    Kong, Qiankun
    Wang, Haoran
    Pang, Yanwei
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1349 - 1357
  • [24] MSTrack: Visual Tracking with Multi-scale Attention
    Song, Chunlin
    Yao, Yu
    Guo, Jianhui
    Li, Lunbo
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 337 - 344
  • [25] Multi-Attention Object Detection Model in Remote Sensing Images Based on Multi-Scale
    Ying, Xiang
    Wang, Qiang
    Li, Xuewei
    Yu, Mei
    Jiang, Han
    Gao, Jie
    Liu, Zhiqiang
    Yu, Ruiguo
    IEEE ACCESS, 2019, 7 : 94508 - 94519
  • [26] Multi-Scale Target Detection in SAR Image Based on Visual Attention Model
    Wang, Zhaocheng
    Du, Lan
    Wang, Fei
    Su, Hongtao
    Zhou, Yu
    2015 IEEE 5TH ASIA-PACIFIC CONFERENCE ON SYNTHETIC APERTURE RADAR (APSAR), 2015, : 704 - 709
  • [27] A novel monocular object detection and localization framework based on inverted multi-scale attention
    Chen, Yuqing
    Xie, Shiwen
    Wu, Yahua
    Hu, Huosheng
    ENGINEERING RESEARCH EXPRESS, 2025, 7 (01):
  • [28] Multi-Scale Attention and Encoder-Decoder Network for Video Saliency Object Detection
    Hongbo Bi
    Huihui Zhu
    Lina Yang
    Ranwan Wu
    Pattern Recognition and Image Analysis, 2022, 32 : 340 - 350
  • [29] Multi-Scale Attention and Encoder-Decoder Network for Video Saliency Object Detection
    Bi, Hongbo
    Zhu, Huihui
    Yang, Lina
    Wu, Ranwan
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2022, 32 (02) : 340 - 350
  • [30] An Efficient Implementation of FPGA-based Object Detection Using Multi-scale Attention
    Furuta, Masanori
    Ban, Koichiro
    Kobayashi, Daisuke
    Shibata, Tomoyuki
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 321 - 325