Transformer-based cross-modality interaction guidance network for RGB-T salient object detection

被引：1

作者：

Luo, Jincheng ^{[1
]}

Li, Yongjun ^{[1
]}

Li, Bo ^{[1
]}

Zhang, Xinru ^{[1
]}

Li, Chaoyue ^{[1
]}

Chenjin, Zhimin ^{[1
]}

He, Jingyi ^{[1
]}

Liang, Yifei ^{[1
]}

机构：

[1] Henan Univ, Sch Phys & Elect, Kaifeng 475004, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 600卷

基金：

中国国家自然科学基金;

关键词：

Salient object detection; RGB-thermal images; Transformer; Feature fusion; FEATURE INTEGRATION NETWORK; FEATURE FUSION; ATTENTION; MODEL; DECODER; CONTEXT;

D O I：

10.1016/j.neucom.2024.128149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Exploring more effective multimodal fusion strategies is still challenging for RGB-T salient object detection (SOD). Most RGB-T SOD methods tend to focus on the strategy of acquiring modal complementary features by utilizing foreground information while ignoring the importance of background information for salient object localization. In addition, feature fusion without information filtering may introduce more noise. To solve these problems, this paper proposes a new cross-modal interaction guidance network (CIGNet) for RGB-T saliency object detection. Specifically, we construct a transformer-based dual-stream encoder to extract multimodal features. In the decoder, we propose an attention mechanism-based modal information complementary module (MICM) for capturing cross-modal complementary information for global comparison and salient object localization. Based on the MICM features, we design a multi-scale adaptive fusion module (MAFM) to find the optimal salient region of the multi-scale fusion process and reduce redundant features. In order to enhance the completeness of salient features after multi-scale feature fusion, this paper proposes the saliency region mining module (SRMM), which corrects the features in the boundary neighborhood by exploiting the differences between foreground and background pixels and the boundary. Comparisons with other state-of-the-art methods on three RGB-T datasets and five RGB-D datasets, the experimental results demonstrate the superiority and extensiveness of the proposed CIGNet.

引用

页数：14

共 50 条

[1] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
Xie, Zhengxuan
Shao, Feng
Chen, Gang
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
[2] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
Lv, Chengtao
Zhou, Xiaofei
Wan, Bin
Wang, Shuai
Sun, Yaoqi
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
[3] CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection
Jin, Dongze
Shao, Feng
Xie, Zhengxuan
Mu, Baoyang
Chen, Hangwei
Jiang, Qiuping
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
[4] Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection
Zhu, Jinchao
Zhang, Xiaoyu
Dong, Feng
Yan, Siyu
Meng, Xianbang
Li, Yuehua
Tan, Panlong
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1989 - 1994
[5] Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection
Zhang, Chen
Cong, Runmin
Lin, Qinwei
Ma, Lin
Li, Feng
Zhao, Yao
Kwong, Sam
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2094 - 2102
[6] Asymmetric cross-modality interaction network for RGB-D salient object detection
Su, Yiming
Gao, Haoran
Wang, Mengyin
Wang, Fasheng
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
[7] Enabling modality interactions for RGB-T salient object detection
Zhang, Qiang
Xi, Ruida
Xiao, Tonglin
Huang, Nianchang
Luo, Yongjiang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222
[8] CGINet: Cross-modality grade interaction network for RGB-T crowd counting
Pan, Yi
Zhou, Wujie
Qian, Xiaohong
Mao, Shanshan
Yang, Rongwang
Yu, Lu
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[9] Feature aggregation with transformer for RGB-T salient object detection
Zhang, Ping
Xu, Mengnan
Zhang, Ziyan
Gao, Pan
Zhang, Jing
NEUROCOMPUTING, 2023, 546
[10] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
Chen, Gang
Shao, Feng
Chai, Xiongli
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323

← 1 2 3 4 5 →