CrossFormer: Cross-guided attention for multi-modal object detection

被引：10

作者：

Lee, Seungik ^{[1
]}

Park, Jaehyeong ^{[2
]}

Park, Jinsun ^{[2
,3
]}

机构：

[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea

[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea

[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 179卷

基金：

新加坡国家研究基金会;

关键词：

Object detection; Multi-modal; Sensor fusion; TRANSFORMER;

D O I：

10.1016/j.patrec.2024.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.

引用

页码：144 / 150

页数：7

共 50 条

[41] Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network
Wang, Jinpeng
Su, Nan
Zhao, Chunhui
Yan, Yiming
Feng, Shou
REMOTE SENSING, 2024, 16 (20)
[42] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
Munusamy, Hemalatha
Sekhar, Chandra C.
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479
[43] Text generation and multi-modal knowledge transfer for few-shot object detection
Du, Yaoyang
Liu, Fang
Jiao, Licheng
Li, Shuo
Hao, Zehua
Li, Pengfang
Wang, Jiahao
Wang, Hao
Liu, Xu
PATTERN RECOGNITION, 2025, 161
[44] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
Li, Xin
Shi, Botian
Hou, Yuenan
Wu, Xingjiao
Ma, Tianlong
Li, Yikang
He, Liang
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
[45] Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
Liu, Yanxing
Pan, Zongxu
Yang, Jianwei
Zhou, Peiling
Zhang, Bingchen
REMOTE SENSING, 2024, 16 (24)
[46] Small Object Detection Technology Using Multi-Modal Data Based on Deep Learning
Park, Chi-Won
Seo, Yuri
Sun, Teh-Jen
Lee, Ga-Won
Huh, Eui-Nam
2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 420 - 422
[47] Multi-Modal fake news Detection on Social Media with Dual Attention Fusion Networks
Yang, Haitian
Zhao, Xuan
Sun, Degang
Wang, Yan
Zhu, He
Ma, Chao
Huang, Weiqing
26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
[48] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
Chen, Zehui
Li, Zhenyu
Zhang, Shiquan
Fang, Liangji
Jiang, Qinhong
Zhao, Feng
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
[49] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
Gao, Xin
Zhang, Guoying
Xiong, Yijin
MEASUREMENT, 2022, 194
[50] Multi-modal Robustness Fake News Detection with Cross-Modal and Propagation Network Contrastive Learning
Chen, Han
Wang, Hairong
Liu, Zhipeng
Li, Yuhua
Hu, Yifan
Zhang, Yujing
Shu, Kai
Li, Ruixuan
Yu, Philip S.
KNOWLEDGE-BASED SYSTEMS, 2025, 309

← 1 2 3 4 5 →