CrossFormer: Cross-guided attention for multi-modal object detection

被引：10

作者：

Lee, Seungik ^{[1
]}

Park, Jaehyeong ^{[2
]}

Park, Jinsun ^{[2
,3
]}

机构：

[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea

[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea

[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 179卷

基金：

新加坡国家研究基金会;

关键词：

Object detection; Multi-modal; Sensor fusion; TRANSFORMER;

D O I：

10.1016/j.patrec.2024.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.

引用

页码：144 / 150

页数：7

共 50 条

[21] Multi-Modal Dataset Generation using Domain Randomization for Object Detection
Marez, Diego
Nans, Lena
Borden, Samuel
GEOSPATIAL INFORMATICS XI, 2021, 11733
[22] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
Wang, Kunpeng
Tu, Zhengzheng
Li, Chenglong
Zhang, Cheng
Luo, Bin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358
[23] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Song, Peipei
Zhang, Jing
Koniusz, Piotr
Barnes, Nick
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
[24] MEDMCN: a novel multi-modal EfficientDet with multi-scale CapsNet for object detection
Li, Xingye
Liu, Jin
Tang, Zhengyu
Han, Bing
Wu, Zhongdai
JOURNAL OF SUPERCOMPUTING, 2024, 80 (09) : 12863 - 12890
[25] Multi-Modal Adversarial Example Detection with Transformer
Ding, Chaoyue
Sun, Shiliang
Zhao, Jing
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[26] MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention
Wang, Xixi
Wang, Xiao
Jiang, Bo
Tang, Jin
Luo, Bin
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3867 - 3888
[27] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
Liu, Zhanwen
Cheng, Juanru
Fan, Jin
Lin, Shan
Wang, Yang
Zhao, Xiangmo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
[28] Multi-modal object detection using unsupervised transfer learning and adaptation techniques
Abbott, Rachael
Robertson, Neil
del Rincon, Jesus Martinez
Connor, Barry
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS, 2019, 11169
[29] Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection
Zhu, Yaohui
Sun, Xiaoyu
Wang, Miao
Huang, Hua
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9984 - 9995
[30] Hierarchical graph attention networks for multi-modal rumor detection on social media
Xu, Fan
Zeng, Lei
Huang, Qi
Yan, Keyu
Wang, Mingwen
Sheng, Victor S.
NEUROCOMPUTING, 2024, 569

← 1 2 3 4 5 →