Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引:5
|
作者
You, Shuai [1 ]
Xie, Xuedong [2 ]
Feng, Yujian [1 ]
Mei, Chaojun [2 ]
Ji, Yimu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China
[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Multispectral object detection; modality differences; multi-scale features; transformer;
D O I
10.1109/LSP.2023.3309578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.
引用
收藏
页码:1172 / 1176
页数:5
相关论文
共 50 条
  • [41] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
    Sima, Haifeng
    Chen, Bailiang
    Tang, Chaosheng
    Zhang, Yudong
    Sun, Junding
    IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
  • [42] Encoder deep interleaved network with multi-scale aggregation for RGB-D salient object detection
    Feng, Guang
    Meng, Jinyu
    Zhang, Lihe
    Lu, Huchuan
    PATTERN RECOGNITION, 2022, 128
  • [43] Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection
    Gao, Wenting
    Li, Xiaojuan
    Han, Yu
    Liu, Yue
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 139 - 150
  • [44] Infrastructure-Side Point Cloud Object Detection via Multi-Frame Aggregation and Multi-Scale Fusion
    Yue, Ye
    Qi, Honggang
    Deng, Yongqiang
    Li, Juanjuan
    Liang, Hao
    Miao, Jun
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (01) : 703 - 713
  • [45] Multi-Object Tracking Framework Based on Multi-Scale Temporal Feature Aggregation
    Liu, Jialiang
    Hu, Xiaopeng
    2023 3rd International Conference on Electronic Information Engineering and Computer, EIECT 2023, 2023, : 68 - 72
  • [46] Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation*
    Liu, Shu
    Qi, Xiaojuan
    Shi, Jianping
    Zhang, Hong
    Jia, Jiaya
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3141 - 3149
  • [47] MSSD: multi-scale self-distillation for object detection
    Zihao Jia
    Shengkun Sun
    Guangcan Liu
    Bo Liu
    Visual Intelligence, 2 (1):
  • [48] MULTI-SCALE OBJECT DETECTION IN SATELLITE IMAGERY BASED ON YOLT
    Li, Wentong
    Li, Wanyi
    Yang, Feng
    Wang, Peng
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 162 - 165
  • [49] Multi-scale edge detection and object extraction for image retrieval
    Ferreira, Miguel
    Kiranyaz, Serkan
    Gabbouj, Moncef
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1629 - 1632
  • [50] Salient object detection via multi-scale attention CNN
    Ji, Yuzhu
    Zhang, Haijun
    Wu, Q. M. Jonathan
    NEUROCOMPUTING, 2018, 322 : 130 - 140