Cross-Modal Local Calibration and Global Context Modeling Network for RGB–Infrared Remote-Sensing Object Detection

被引:7
|
作者
Xie, Jin [1 ]
Nie, Jing [2 ]
Ding, Bonan [1 ]
Yu, Mingyang [1 ]
Cao, Jiale [3 ,4 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 400044, Peoples R China
[2] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 400044, Peoples R China
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Multimodal fusion; object detection; remote-sensing object detection; IMAGERY;
D O I
10.1109/JSTARS.2023.3315544
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
RGB-infrared object detection in remote-sensing images is crucial for achieving around-clock surveillance of unmanned aerial vehicles. RGB-infrared remote-sensing object detection methods based on deep learning usually mine the complementary information from RGB and infrared modalities by utilizing feature aggregation to achieve robust object detection for around-the-clock applications. Most of the existing methods aggregate features from RGB and infrared images by utilizing elementwise operations (e.g., elementwise addition or concatenation). The detection accuracy of these methods is limited. The main reasons can be concluded as follows: local location misalignment across modalities and insufficient nonlocal contextual information extraction. To address the above issues, we propose a cross-modal local calibration and global context modeling network (CLGNet), consisting of two novel modules: a cross-modal local calibration (CLC) module and a cross-modal global context (CGC) modeling module. The CLC module first aligns features from different modalities and then aggregates them selectively. The CGC module is embedded into the backbone network to capture cross-modal nonlocal long-range dependencies. The experimental results on popular RGB-infrared remote-sensing object detection datasets, namely DRoneVehicle and VEDAI, demonstrate the effectiveness and efficiency of our CLGNet.
引用
收藏
页码:8933 / 8942
页数:10
相关论文
共 50 条
  • [21] Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
    Yuan, Zhiqiang
    Zhang, Wenkai
    Tian, Changyuan
    Rong, Xuee
    Zhang, Zhengyuan
    Wang, Hongqi
    Fu, Kun
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [22] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [23] RGB-D salient object detection with asymmetric cross-modal fusion
    Yu M.
    Xing Z.-H.
    Liu Y.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2487 - 2495
  • [24] Global-local-global context-aware network for salient object detection in optical remote sensing images
    Bai, Zhen
    Li, Gongyang
    Liu, Zhi
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 198 : 184 - 196
  • [25] Global to Local: A Scale-Aware Network for Remote Sensing Object Detection
    Gao, Tao
    Niu, Qianqian
    Zhang, Jing
    Chen, Ting
    Mei, Shaohui
    Jubair, Ahmad
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [26] MFCANet: Multiscale Feature Context Aggregation Network for Oriented Object Detection in Remote-Sensing Images
    Jiang, Honghui
    Luo, Tingting
    Peng, Hu
    Zhang, Guozheng
    IEEE ACCESS, 2024, 12 : 45986 - 46001
  • [27] A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection
    Liu, Zhengyi
    Zhang, Wei
    Zhao, Peng
    NEUROCOMPUTING, 2020, 387 : 210 - 220
  • [28] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    VISUAL COMPUTER, 2023, 39 (09): : 3979 - 3994
  • [29] CMA-SOD: cross-modal attention fusion network for RGB-D salient object detection
    Wang, Kexuan
    Liu, Chenhua
    Zhang, Rongfu
    VISUAL COMPUTER, 2024,
  • [30] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Yue Gao
    Meng Dai
    Qing Zhang
    The Visual Computer, 2023, 39 : 3979 - 3994