Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引:5
|
作者
You, Shuai [1 ]
Xie, Xuedong [2 ]
Feng, Yujian [1 ]
Mei, Chaojun [2 ]
Ji, Yimu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China
[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Multispectral object detection; modality differences; multi-scale features; transformer;
D O I
10.1109/LSP.2023.3309578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.
引用
收藏
页码:1172 / 1176
页数:5
相关论文
共 50 条
  • [1] Multi-scale aggregation feature pyramid with cornerness for underwater object detection
    Li, Xinbin
    Yu, Haifeng
    Chen, Haiyang
    VISUAL COMPUTER, 2024, 40 (02): : 1299 - 1310
  • [2] Multi-scale aggregation feature pyramid with cornerness for underwater object detection
    Xinbin Li
    Haifeng Yu
    Haiyang Chen
    The Visual Computer, 2024, 40 (2) : 1299 - 1310
  • [3] Multi-Scale Residual Aggregation Feature Pyramid Network for Object Detection
    Wang, Hongyang
    Wang, Tiejun
    ELECTRONICS, 2023, 12 (01)
  • [4] Multi-scale feature aggregation and boundary awareness network for salient object detection
    Wu, Qin
    Wang, Jianzhe
    Chai, Zhilei
    Guo, Guodong
    IMAGE AND VISION COMPUTING, 2022, 122
  • [5] Multi-scale Information Aggregation for Spoofing Detection
    Li, Changtao
    Wan, Yi
    Yang, Feiran
    Yang, Jun
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [6] Multi-Scale Object Detection by Clustering Lines
    Ommer, Bjoern
    Malik, Jitendra
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 484 - 491
  • [7] Selective Multi-scale Learning for Object Detection
    Chen, Junliang
    Lu, Weizeng
    Shen, Linlin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 3 - 14
  • [8] Feature Enhancement for Multi-scale Object Detection
    Huicheng Zheng
    Jiajie Chen
    Lvran Chen
    Ye Li
    Zhiwei Yan
    Neural Processing Letters, 2020, 51 : 1907 - 1919
  • [9] Feature Enhancement for Multi-scale Object Detection
    Zheng, Huicheng
    Chen, Jiajie
    Chen, Lvran
    Li, Ye
    Yan, Zhiwei
    NEURAL PROCESSING LETTERS, 2020, 51 (02) : 1907 - 1919
  • [10] Attention to the Scale : Deep Multi-Scale Salient Object Detection
    Zhang, Jing
    Dai, Yuchao
    Li, Bo
    He, Mingyi
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 105 - 111