Enhanced RGB-T saliency detection via thermal-guided multi-stage attention network

被引:1
作者
Pang, Yu [1 ]
Huang, Yang [1 ]
Weng, Chenyu [1 ]
Lyu, Jialin [1 ]
Bai, Chuanyue [1 ]
Yu, Xiaosheng [2 ]
机构
[1] Shenyang Univ Technol, Sch Artificial Intelligence, Shenyang 110870, Liaoning, Peoples R China
[2] Northeastern Univ, Fac Robot Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-T saliency detection; Single-stream network; Multi-stage framework; Modality-interaction; Attention mechanism; FUSION;
D O I
10.1007/s00371-025-03855-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Single-stream structures are prevalent in RGB-T saliency detection due to their efficiency and lightweight nature. However, existing multi-modal single-stream methods suffer from limited detection performance, primarily due to inadequate exploitation of thermal modality's strengths. To address this, we propose a novel single-stream network called Thermal-induced Modality-interaction Multi-stage Attention Network (TMMANet). Our approach leverages thermal-induced attention mechanisms in both the encoder and decoder stages to effectively integrate RGB and thermal modalities. In the encoder, a Thermal-induced Modality-interaction Self-Attention mechanism is introduced to extract powerful cross-modal features. In the decoder, a Thermal-induced Modality-interaction Dual-Branch Attention mechanism is designed to generate accurate saliency predictions by constructing modality-aware integration of foreground and background branches. Extensive experiments demonstrate that TMMANet outperforms most state-of-the-art RGB-T, RGB and RGB-D methods under various evaluation metrics, this highlights its effectiveness in enhancing RGB-T saliency detection performance. The related data of our TMMANet are released at https://github.com/SUTPangYu/TMMANet.
引用
收藏
页码:8055 / 8073
页数:19
相关论文
共 7 条
[1]   Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection [J].
Huo, Fushuo ;
Zhu, Xuegui ;
Zhang, Lei ;
Liu, Qifeng ;
Shu, Yu .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :3111-3124
[2]   Attention Guided Food Recognition via Multi-Stage Local Feature Fusion [J].
Deng, Gonghui ;
Wu, Dunzhi ;
Chen, Weizhen .
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02) :1985-2003
[3]   PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection [J].
Jiang, Mingfeng ;
Ma, Jianhua ;
Chen, Jiatong ;
Wang, Yaming ;
Fang, Xian .
KNOWLEDGE-BASED SYSTEMS, 2024, 291
[4]   Enhanced rolling bearing fault diagnosis using a multi-stage attention fusion network [J].
Ma, Mingyuan ;
Qu, Chenxi ;
Zhao, Xudong ;
Li, Fenglei ;
Qu, Shengguan .
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2025, 239 (13) :5297-5309
[5]   DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection [J].
Ji, Wei ;
Yan, Ge ;
Li, Jingjing ;
Piao, Yongri ;
Yao, Shunyu ;
Zhang, Miao ;
Cheng, Li ;
Lu, Huchuan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2321-2336
[6]   AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection [J].
Liu, Zhengyi ;
Wang, Yuan ;
Tan, Yacheng ;
Li, Wei ;
Xiao, Yun .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 104
[7]   Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection [J].
Liang, Yanhua ;
Qin, Guihe ;
Sun, Minghui ;
Qin, Jun ;
Yan, Jie ;
Zhang, Zhonghan .
NEUROCOMPUTING, 2022, 490 :132-145