Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引:5
|
作者
You, Shuai [1 ]
Xie, Xuedong [2 ]
Feng, Yujian [1 ]
Mei, Chaojun [2 ]
Ji, Yimu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China
[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Multispectral object detection; modality differences; multi-scale features; transformer;
D O I
10.1109/LSP.2023.3309578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.
引用
收藏
页码:1172 / 1176
页数:5
相关论文
共 50 条
  • [31] Multi-Scale Vision Transformer for Defect Object Detection
    Lou, Liangshan
    Lu, Ke
    Xue, Jian
    Procedia Computer Science, 2023, 222 : 397 - 406
  • [32] Multi-scale volumes for deep object detection and localization
    Ohn-Bar, Eshed
    Trivedi, Mohan Manubhai
    PATTERN RECOGNITION, 2017, 61 : 557 - 572
  • [33] Multi-scale HOG Feature Used in Object Detection
    Li, Jin
    Zhang, Hong
    Zhang, Lei
    Li, Yawei
    Kang, Qiaochu
    Luo, Zhaohui
    Wu, Yujie
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [34] Multi-Scale Cascade Network for Salient Object Detection
    Li, Xin
    Yang, Fan
    Cheng, Hong
    Chen, Junyu
    Guo, Yuxiao
    Chen, Leiting
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 439 - 447
  • [35] MGFPN: Enhancing multi-scale feature for object detection
    He, Weiming
    Wu, You
    Xiao, Jing
    Cao, Yang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (06) : 11171 - 11181
  • [36] Salient object detection based on multi-scale contrast
    Wang, Hai
    Dai, Lei
    Cai, Yingfeng
    Sun, Xiaoqiang
    Chen, Long
    NEURAL NETWORKS, 2018, 101 : 47 - 56
  • [37] Multi-scale coupled attention for visual object detection
    Li, Fei
    Yan, Hongping
    Shi, Linsu
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [38] Lightweight multi-scale network for small object detection
    Li L.
    Li B.
    Zhou H.
    PeerJ Computer Science, 2022, 8
  • [39] Object Detection Using Multi-Scale Balanced Sampling
    Yu, Hang
    Gong, Jiulu
    Chen, Derong
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [40] Camouflage Object Segmentation with Multi-scale Feature Aggregation and Boundary Generation
    He, Ye
    Su, Wen
    Ge, Jinfeng
    Jia, Guoqiang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 426 - 439