CrossFormer: Cross-guided attention for multi-modal object detection

被引：10

作者：

Lee, Seungik ^{[1
]}

Park, Jaehyeong ^{[2
]}

Park, Jinsun ^{[2
,3
]}

机构：

[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea

[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea

[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 179卷

基金：

新加坡国家研究基金会;

关键词：

Object detection; Multi-modal; Sensor fusion; TRANSFORMER;

D O I：

10.1016/j.patrec.2024.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.

引用

页码：144 / 150

页数：7

共 50 条

[1] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
Wu, Jiajia
Han, Guangliang
Wang, Haining
Yang, Hang
Li, Qingqing
Liu, Dongxu
Ye, Fangjian
Liu, Peixun
IEEE ACCESS, 2021, 9 : 150608 - 150622
[2] Deep learning based object detection from multi-modal sensors: an overview
Ye Liu
Shiyang Meng
Hongzhang Wang
Jun Liu
Multimedia Tools and Applications, 2024, 83 : 19841 - 19870
[3] Deep learning based object detection from multi-modal sensors: an overview
Liu, Ye
Meng, Shiyang
Wang, Hongzhang
Liu, Jun
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19841 - 19870
[4] Deep Multi-modal Object Detection for Autonomous Driving
Ennajar, Amal
Khouja, Nadia
Boutteau, Remi
Tlili, Fethi
2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
[5] Multi-modal object detection via transformer network
Liu, Wenbing
Wang, Haibo
Gao, Quanxue
Zhu, Zhaorui
IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550
[6] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
Chen, Gang
Shao, Feng
Chai, Xiongli
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
[7] Multi-Modal 3D Object Detection by Box Matching
Liu, Zhe
Ye, Xiaoqing
Zou, Zhikang
He, Xinwei
Tan, Xiao
Ding, Errui
Wang, Jingdong
Bai, Xiang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 19917 - 19928
[8] Fraud Detection with Multi-Modal Attention and Correspondence Learning
Park, Jongchan
Kim, Min-Hyun
Choi, Seibum
Kweon, In So
Choi, Dong-Geol
2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 278 - 284
[9] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection
Wang, Jie
Song, Kechen
Bao, Yanqi
Huang, Liming
Yan, Yunhui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2949 - 2961
[10] Class-Agnostic Object Detection with Multi-modal Transformer
Maaz, Muhammad
Rasheed, Hanoona
Khan, Salman
Khan, Fahad Shahbaz
Anwer, Rao Muhammad
Yang, Ming-Hsuan
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531

← 1 2 3 4 5 →