CrossFormer: Cross-guided attention for multi-modal object detection

被引：10

作者：

Lee, Seungik ^{[1
]}

Park, Jaehyeong ^{[2
]}

Park, Jinsun ^{[2
,3
]}

机构：

[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea

[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea

[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 179卷

基金：

新加坡国家研究基金会;

关键词：

Object detection; Multi-modal; Sensor fusion; TRANSFORMER;

D O I：

10.1016/j.patrec.2024.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.

引用

页码：144 / 150

页数：7

共 50 条

[31] Multi-Modal Co-Attention Capsule Network for Fake News Detection
Optical Memory and Neural Networks, 2024, 33 : 13 - 27
[32] Multi-Modal Co-Attention Capsule Network for Fake News Detection
Yin, Chunyan
Chen, Yongheng
OPTICAL MEMORY AND NEURAL NETWORKS, 2024, 33 (01) : 13 - 27
[33] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
Wang, Yingjie
Mao, Qiuyu
Zhu, Hanqi
Deng, Jiajun
Zhang, Yu
Ji, Jianmin
Li, Houqiang
Zhang, Yanyong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152
[34] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
Yingjie Wang
Qiuyu Mao
Hanqi Zhu
Jiajun Deng
Yu Zhang
Jianmin Ji
Houqiang Li
Yanyong Zhang
International Journal of Computer Vision, 2023, 131 : 2122 - 2152
[35] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Xu, Yifan
Zhang, Mengdan
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
[36] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
Guo, Ruohao
Ying, Xianghua
Qi, Yanyu
Qu, Liao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
[37] Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-Modal Fake News Detection
Chen, Jinyin
Jia, Chengyu
Zheng, Haibin
Chen, Ruoxi
Fu, Chenbo
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2023, 10 (06): : 3144 - 3158
[38] Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion
Wu, Yi
Jiang, Xiaoyan
Fang, Zhijun
Gao, Yongbin
Fujita, Hamido
APPLIED SOFT COMPUTING, 2021, 108
[39] A hierarchical multi-modal cross-attention model for face anti-spoofing
Xue, Hao
Ma, Jing
Guo, Xiaoyu
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
[40] Multi-Modal Depression Detection and Estimation
Yang, Le
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 26 - 30

← 1 2 3 4 5 →