Multi-Scale Aggregation Transformers for Multispectral Object Detection

被引：5

作者：

You, Shuai ^{[1
]}

Xie, Xuedong ^{[2
]}

Feng, Yujian ^{[1
]}

Mei, Chaojun ^{[2
]}

Ji, Yimu ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun NJUPT, Sch Internet Things, Nanjing 210023, Peoples R China

[2] NJUPT, Sch Comp Sci & Technol, Nanjing 210023, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Multispectral object detection; modality differences; multi-scale features; transformer;

D O I：

10.1109/LSP.2023.3309578

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multispectral object detection for autonomous driving is multi-object localization and classification task on visible and thermal modalities. In this scenario, modality differences lead to the lack of object information in a single modality and the misalignment of cross-modality information. To alleviate these problems, most existing methods extract information based on a single scale (e.g., these methods mainly focus on detecting significant cars or pedestrians), which leads to insufficient performance in capturing multi-scale discriminative information (e.g., small bicycles and blurred pedestrians) and safety hazards in the driving process. In this letter, we propose a Multi-Scale Aggregation Network (MSANet) consisting of two parts Multi-Scale Aggregation Transformer (MSAT) and the Cross-modal Merging Fusion Mechanism (CMFM), which combined with the advantages of Transformer and CNN to extract rich image information from two modalities by mining both local and global context dependencies. Firstly, to reduce the lack of information in a single modality, we design a novel MSAT module to extract rich details and texture from multi-scale. Secondly, to alleviate feature misalignment caused by modality differences, the CMFM is utilized to aggregate complementary information on multiple levels. Comprehensive experiments on two benchmarks demonstrate that our approach shows better results than several state-of-the-art methods.

引用

页码：1172 / 1176

页数：5

共 50 条

[31] Multi-Scale Vision Transformer for Defect Object Detection
Lou, Liangshan
Lu, Ke
Xue, Jian
Procedia Computer Science, 2023, 222 : 397 - 406
[32] Multi-scale volumes for deep object detection and localization
Ohn-Bar, Eshed
Trivedi, Mohan Manubhai
PATTERN RECOGNITION, 2017, 61 : 557 - 572
[33] Multi-scale HOG Feature Used in Object Detection
Li, Jin
Zhang, Hong
Zhang, Lei
Li, Yawei
Kang, Qiaochu
Luo, Zhaohui
Wu, Yujie
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
[34] Multi-Scale Cascade Network for Salient Object Detection
Li, Xin
Yang, Fan
Cheng, Hong
Chen, Junyu
Guo, Yuxiao
Chen, Leiting
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 439 - 447
[35] MGFPN: Enhancing multi-scale feature for object detection
He, Weiming
Wu, You
Xiao, Jing
Cao, Yang
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (06) : 11171 - 11181
[36] Salient object detection based on multi-scale contrast
Wang, Hai
Dai, Lei
Cai, Yingfeng
Sun, Xiaoqiang
Chen, Long
NEURAL NETWORKS, 2018, 101 : 47 - 56
[37] Multi-scale coupled attention for visual object detection
Li, Fei
Yan, Hongping
Shi, Linsu
SCIENTIFIC REPORTS, 2024, 14 (01):
[38] Lightweight multi-scale network for small object detection
Li L.
Li B.
Zhou H.
PeerJ Computer Science, 2022, 8
[39] Object Detection Using Multi-Scale Balanced Sampling
Yu, Hang
Gong, Jiulu
Chen, Derong
APPLIED SCIENCES-BASEL, 2020, 10 (17):
[40] Camouflage Object Segmentation with Multi-scale Feature Aggregation and Boundary Generation
He, Ye
Su, Wen
Ge, Jinfeng
Jia, Guoqiang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 426 - 439

← 1 2 3 4 5 →