CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [41] Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network
    Wang, Jinpeng
    Su, Nan
    Zhao, Chunhui
    Yan, Yiming
    Feng, Shou
    REMOTE SENSING, 2024, 16 (20)
  • [42] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
    Munusamy, Hemalatha
    Sekhar, Chandra C.
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479
  • [43] Text generation and multi-modal knowledge transfer for few-shot object detection
    Du, Yaoyang
    Liu, Fang
    Jiao, Licheng
    Li, Shuo
    Hao, Zehua
    Li, Pengfang
    Wang, Jiahao
    Wang, Hao
    Liu, Xu
    PATTERN RECOGNITION, 2025, 161
  • [44] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [45] Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
    Liu, Yanxing
    Pan, Zongxu
    Yang, Jianwei
    Zhou, Peiling
    Zhang, Bingchen
    REMOTE SENSING, 2024, 16 (24)
  • [46] Small Object Detection Technology Using Multi-Modal Data Based on Deep Learning
    Park, Chi-Won
    Seo, Yuri
    Sun, Teh-Jen
    Lee, Ga-Won
    Huh, Eui-Nam
    2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 420 - 422
  • [47] Multi-Modal fake news Detection on Social Media with Dual Attention Fusion Networks
    Yang, Haitian
    Zhao, Xuan
    Sun, Degang
    Wang, Yan
    Zhu, He
    Ma, Chao
    Huang, Weiqing
    26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [48] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [49] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
    Gao, Xin
    Zhang, Guoying
    Xiong, Yijin
    MEASUREMENT, 2022, 194
  • [50] Multi-modal Robustness Fake News Detection with Cross-Modal and Propagation Network Contrastive Learning
    Chen, Han
    Wang, Hairong
    Liu, Zhipeng
    Li, Yuhua
    Hu, Yifan
    Zhang, Yujing
    Shu, Kai
    Li, Ruixuan
    Yu, Philip S.
    KNOWLEDGE-BASED SYSTEMS, 2025, 309