Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引:5
|
作者
You, Shuai [1 ]
Xie, Xuedong [2 ]
Feng, Yujian [1 ]
Mei, Chaojun [2 ]
Ji, Yimu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China
[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Multispectral object detection; modality differences; multi-scale features; transformer;
D O I
10.1109/LSP.2023.3309578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.
引用
收藏
页码:1172 / 1176
页数:5
相关论文
共 50 条
  • [21] Multi-scale semantic enhancement network for object detection
    Guo, Dongen
    Wu, Zechen
    Feng, Jiangfan
    Zou, Tao
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] Multi-scale Context Enhancement Network for Object Detection
    Wang, Yanan
    Ma, Yingdong
    2022 2ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE (SEAI 2022), 2022, : 6 - 11
  • [23] DYNAMIC MULTI-SCALE LOSS BALANCE FOR OBJECT DETECTION
    Luo, Yihao
    Cao, Xiang
    Zhang, Juntao
    Cheng, Peng
    Wang, Tianjiang
    Feng, Qi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4873 - 4877
  • [24] StairsNet: Mixed Multi-scale Network for Object Detection
    Gao, Weiyi
    Cao, Wenlong
    Zhai, Jian
    Rui, Jianwu
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 303 - 314
  • [25] Multi-scale Interactive Network for Salient Object Detection
    Pang, Youwei
    Zhao, Xiaoqi
    Zhang, Lihe
    Lu, Huchuan
    arXiv, 2020,
  • [26] Dynamic multi-scale loss optimization for object detection
    Yihao Luo
    Xiang Cao
    Juntao Zhang
    Peng Cheng
    Tianjiang Wang
    Qi Feng
    Multimedia Tools and Applications, 2023, 82 : 2349 - 2367
  • [27] AUTONOMOUS MULTI-SCALE OBJECT DETECTION WITH HOUGH FORESTS
    Scalzo, Maria
    Velipasalar, Senem
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1643 - 1647
  • [28] Lightweight multi-scale network for small object detection
    Li, Li
    Li, Bingxue
    Zhou, Hongjuan
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [29] Multi-scale Pyramid Feature Maps for Object Detection
    Hao Huijun
    Ye Ronghua
    Chen Zhongyu
    Zheng Zhonglong
    2017 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2017, : 237 - 240
  • [30] Multi-scale redistribution feature pyramid for object detection
    Qian, Huifang
    Guo, Jiahao
    Zhou, Xuan
    AI COMMUNICATIONS, 2022, 35 (01) : 15 - 30