Enhanced RGB-T saliency detection via thermal-guided multi-stage attention network

被引：1

作者：

Pang, Yu ^{[1
]}

Huang, Yang ^{[1
]}

Weng, Chenyu ^{[1
]}

Lyu, Jialin ^{[1
]}

Bai, Chuanyue ^{[1
]}

Yu, Xiaosheng ^{[2
]}

机构：

[1] Shenyang Univ Technol, Sch Artificial Intelligence, Shenyang 110870, Liaoning, Peoples R China

[2] Northeastern Univ, Fac Robot Sci & Engn, Shenyang 110169, Liaoning, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 10期

基金：

中国国家自然科学基金;

关键词：

RGB-T saliency detection; Single-stream network; Multi-stage framework; Modality-interaction; Attention mechanism; FUSION;

D O I：

10.1007/s00371-025-03855-3

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Single-stream structures are prevalent in RGB-T saliency detection due to their efficiency and lightweight nature. However, existing multi-modal single-stream methods suffer from limited detection performance, primarily due to inadequate exploitation of thermal modality's strengths. To address this, we propose a novel single-stream network called Thermal-induced Modality-interaction Multi-stage Attention Network (TMMANet). Our approach leverages thermal-induced attention mechanisms in both the encoder and decoder stages to effectively integrate RGB and thermal modalities. In the encoder, a Thermal-induced Modality-interaction Self-Attention mechanism is introduced to extract powerful cross-modal features. In the decoder, a Thermal-induced Modality-interaction Dual-Branch Attention mechanism is designed to generate accurate saliency predictions by constructing modality-aware integration of foreground and background branches. Extensive experiments demonstrate that TMMANet outperforms most state-of-the-art RGB-T, RGB and RGB-D methods under various evaluation metrics, this highlights its effectiveness in enhancing RGB-T saliency detection performance. The related data of our TMMANet are released at https://github.com/SUTPangYu/TMMANet.

引用

页码：8055 / 8073

页数：19

共 7 条

[1] Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection [J].

Huo, Fushuo ;

Zhu, Xuegui ;

Zhang, Lei ;

Liu, Qifeng ;

Shu, Yu .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :3111-3124

[2] Attention Guided Food Recognition via Multi-Stage Local Feature Fusion [J].

Deng, Gonghui ;

Wu, Dunzhi ;

Chen, Weizhen .

CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02) :1985-2003

[3] PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection [J].

Jiang, Mingfeng ;

Ma, Jianhua ;

Chen, Jiatong ;

Wang, Yaming ;

Fang, Xian .

KNOWLEDGE-BASED SYSTEMS, 2024, 291

[4] Enhanced rolling bearing fault diagnosis using a multi-stage attention fusion network [J].

Ma, Mingyuan ;

Qu, Chenxi ;

Zhao, Xudong ;

Li, Fenglei ;

Qu, Shengguan .

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2025, 239 (13) :5297-5309

[5] DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection [J].

Ji, Wei ;

Yan, Ge ;

Li, Jingjing ;

Piao, Yongri ;

Yao, Shunyu ;

Zhang, Miao ;

Cheng, Li ;

Lu, Huchuan .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2321-2336

[6] AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection [J].

Liu, Zhengyi ;

Wang, Yuan ;

Tan, Yacheng ;

Li, Wei ;

Xiao, Yun .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 104

[7] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection [J].

Liang, Yanhua ;

Qin, Guihe ;

Sun, Minghui ;

Qin, Jun ;

Yan, Jie ;

Zhang, Zhonghan .

NEUROCOMPUTING, 2022, 490 :132-145

← 1 →