Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引：5

作者：

You, Shuai ^{[1
]}

Xie, Xuedong ^{[2
]}

Feng, Yujian ^{[1
]}

Mei, Chaojun ^{[2
]}

Ji, Yimu ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China

[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Multispectral object detection; modality differences; multi-scale features; transformer;

D O I：

10.1109/LSP.2023.3309578

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.

引用

页码：1172 / 1176

页数：5

共 50 条

[21] Multi-scale semantic enhancement network for object detection
Guo, Dongen
Wu, Zechen
Feng, Jiangfan
Zou, Tao
SCIENTIFIC REPORTS, 2023, 13 (01)
[22] Multi-scale Context Enhancement Network for Object Detection
Wang, Yanan
Ma, Yingdong
2022 2ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE (SEAI 2022), 2022, : 6 - 11
[23] DYNAMIC MULTI-SCALE LOSS BALANCE FOR OBJECT DETECTION
Luo, Yihao
Cao, Xiang
Zhang, Juntao
Cheng, Peng
Wang, Tianjiang
Feng, Qi
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4873 - 4877
[24] StairsNet: Mixed Multi-scale Network for Object Detection
Gao, Weiyi
Cao, Wenlong
Zhai, Jian
Rui, Jianwu
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 303 - 314
[25] Multi-scale Interactive Network for Salient Object Detection
Pang, Youwei
Zhao, Xiaoqi
Zhang, Lihe
Lu, Huchuan
arXiv, 2020,
[26] Dynamic multi-scale loss optimization for object detection
Yihao Luo
Xiang Cao
Juntao Zhang
Peng Cheng
Tianjiang Wang
Qi Feng
Multimedia Tools and Applications, 2023, 82 : 2349 - 2367
[27] AUTONOMOUS MULTI-SCALE OBJECT DETECTION WITH HOUGH FORESTS
Scalzo, Maria
Velipasalar, Senem
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1643 - 1647
[28] Lightweight multi-scale network for small object detection
Li, Li
Li, Bingxue
Zhou, Hongjuan
PEERJ COMPUTER SCIENCE, 2022, 8
[29] Multi-scale Pyramid Feature Maps for Object Detection
Hao Huijun
Ye Ronghua
Chen Zhongyu
Zheng Zhonglong
2017 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2017, : 237 - 240
[30] Multi-scale redistribution feature pyramid for object detection
Qian, Huifang
Guo, Jiahao
Zhou, Xuan
AI COMMUNICATIONS, 2022, 35 (01) : 15 - 30

← 1 2 3 4 5 →