Transformer-based cross-modality interaction guidance network for RGB-T salient object detection

被引:1
|
作者
Luo, Jincheng [1 ]
Li, Yongjun [1 ]
Li, Bo [1 ]
Zhang, Xinru [1 ]
Li, Chaoyue [1 ]
Chenjin, Zhimin [1 ]
He, Jingyi [1 ]
Liang, Yifei [1 ]
机构
[1] Henan Univ, Sch Phys & Elect, Kaifeng 475004, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; RGB-thermal images; Transformer; Feature fusion; FEATURE INTEGRATION NETWORK; FEATURE FUSION; ATTENTION; MODEL; DECODER; CONTEXT;
D O I
10.1016/j.neucom.2024.128149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploring more effective multimodal fusion strategies is still challenging for RGB-T salient object detection (SOD). Most RGB-T SOD methods tend to focus on the strategy of acquiring modal complementary features by utilizing foreground information while ignoring the importance of background information for salient object localization. In addition, feature fusion without information filtering may introduce more noise. To solve these problems, this paper proposes a new cross-modal interaction guidance network (CIGNet) for RGB-T saliency object detection. Specifically, we construct a transformer-based dual-stream encoder to extract multimodal features. In the decoder, we propose an attention mechanism-based modal information complementary module (MICM) for capturing cross-modal complementary information for global comparison and salient object localization. Based on the MICM features, we design a multi-scale adaptive fusion module (MAFM) to find the optimal salient region of the multi-scale fusion process and reduce redundant features. In order to enhance the completeness of salient features after multi-scale feature fusion, this paper proposes the saliency region mining module (SRMM), which corrects the features in the boundary neighborhood by exploiting the differences between foreground and background pixels and the boundary. Comparisons with other state-of-the-art methods on three RGB-T datasets and five RGB-D datasets, the experimental results demonstrate the superiority and extensiveness of the proposed CIGNet.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
    Xie, Zhengxuan
    Shao, Feng
    Chen, Gang
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
  • [2] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [3] CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection
    Jin, Dongze
    Shao, Feng
    Xie, Zhengxuan
    Mu, Baoyang
    Chen, Hangwei
    Jiang, Qiuping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [4] Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection
    Zhu, Jinchao
    Zhang, Xiaoyu
    Dong, Feng
    Yan, Siyu
    Meng, Xianbang
    Li, Yuehua
    Tan, Panlong
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1989 - 1994
  • [5] Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection
    Zhang, Chen
    Cong, Runmin
    Lin, Qinwei
    Ma, Lin
    Li, Feng
    Zhao, Yao
    Kwong, Sam
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2094 - 2102
  • [6] Asymmetric cross-modality interaction network for RGB-D salient object detection
    Su, Yiming
    Gao, Haoran
    Wang, Mengyin
    Wang, Fasheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [7] Enabling modality interactions for RGB-T salient object detection
    Zhang, Qiang
    Xi, Ruida
    Xiao, Tonglin
    Huang, Nianchang
    Luo, Yongjiang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222
  • [8] CGINet: Cross-modality grade interaction network for RGB-T crowd counting
    Pan, Yi
    Zhou, Wujie
    Qian, Xiaohong
    Mao, Shanshan
    Yang, Rongwang
    Yu, Lu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [9] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    NEUROCOMPUTING, 2023, 546
  • [10] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323