CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [1] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [2] Deep learning based object detection from multi-modal sensors: an overview
    Ye Liu
    Shiyang Meng
    Hongzhang Wang
    Jun Liu
    Multimedia Tools and Applications, 2024, 83 : 19841 - 19870
  • [3] Deep learning based object detection from multi-modal sensors: an overview
    Liu, Ye
    Meng, Shiyang
    Wang, Hongzhang
    Liu, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19841 - 19870
  • [4] Deep Multi-modal Object Detection for Autonomous Driving
    Ennajar, Amal
    Khouja, Nadia
    Boutteau, Remi
    Tlili, Fethi
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
  • [5] Multi-modal object detection via transformer network
    Liu, Wenbing
    Wang, Haibo
    Gao, Quanxue
    Zhu, Zhaorui
    IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550
  • [6] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
  • [7] Multi-Modal 3D Object Detection by Box Matching
    Liu, Zhe
    Ye, Xiaoqing
    Zou, Zhikang
    He, Xinwei
    Tan, Xiao
    Ding, Errui
    Wang, Jingdong
    Bai, Xiang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 19917 - 19928
  • [8] Fraud Detection with Multi-Modal Attention and Correspondence Learning
    Park, Jongchan
    Kim, Min-Hyun
    Choi, Seibum
    Kweon, In So
    Choi, Dong-Geol
    2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 278 - 284
  • [9] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection
    Wang, Jie
    Song, Kechen
    Bao, Yanqi
    Huang, Liming
    Yan, Yunhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2949 - 2961
  • [10] Class-Agnostic Object Detection with Multi-modal Transformer
    Maaz, Muhammad
    Rasheed, Hanoona
    Khan, Salman
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    Yang, Ming-Hsuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531