HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

被引：57

作者：

Tang, Bin ^{[1
]}

Liu, Zhengyi ^{[2
]}

Tan, Yacheng ^{[2
]}

He, Qian ^{[2
]}

机构：

[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China

[2] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 02期

关键词：

Task analysis; Convolution; Transformers; Object detection; Feature extraction; Convolutional neural networks; Streaming media; HRFormer; salient object detection; cross modality; RGB-D; RGB-T; light field; RGB-D IMAGE; NETWORK; FUSION;

D O I：

10.1109/TCSVT.2022.3202563

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The High-Resolution Transformer (HRFormer) can maintain high-resolution representation and share global receptive fields. It is friendly towards salient object detection (SOD) in which the input and output have the same resolution. However, two critical problems need to be solved for two-modality SOD. One problem is two-modality fusion. The other problem is the HRFormer output's fusion. To address the first problem, a supplementary modality is injected into the primary modality by using global optimization and an attention mechanism to select and purify the modality at the input level. To solve the second problem, a dual-direction short connection fusion module is used to optimize the output features of HRFormer, thereby enhancing the detailed representation of objects at the output level. The proposed model, named HRTransNet, first introduces an auxiliary stream for feature extraction of supplementary modality. Then, features are injected into the primary modality at the beginning of each multi-resolution branch. Next, HRFormer is applied to achieve forwarding propagation. Finally, all the output features with different resolutions are aggregated by intra-feature and inter-feature interactive transformers. Application of the proposed model results in impressive improvement for driving two-modality SOD tasks, e.g., RGB-D, RGB-T, and light field SOD.https://github.com/liuzywen/HRTransNet

引用

页码：728 / 742

页数：15

共 50 条

[41] Attention-guided Multi-modality Interaction Network for RGB-D Salient Object Detection
Wang, Ruimin
Wang, Fasheng
Su, Yiming
Sun, Jing
Sun, Fuming
Li, Haojie
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (03)
[42] Intra-Modality Self-Enhancement Mirror Network for RGB-T Salient Object Detection
Wang, Jie
Li, Guoqiang
Yu, Hongjie
Xi, Jinwen
Shi, Jie
Wu, Xueying
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2513 - 2525
[43] EFGNet: Encoder steered multi-modality feature guidance network for RGB-D salient object detection
Xia, Chenxing
Duan, Songsong
Fang, Xianjin
Gao, Xiuju
Sun, Yanguang
Ge, Bin
Zhang, Hanling
Li, Kuan-Ching
DIGITAL SIGNAL PROCESSING, 2022, 131
[44] FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection
Jin, Xiao
Guo, Chunle
He, Zhen
Xu, Jing
Wang, Yongwei
Su, Yuting
NEUROCOMPUTING, 2022, 491 : 414 - 425
[45] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
Luo, Jincheng
Li, Yongjun
Li, Bo
Zhang, Xinru
Li, Chaoyue
Chenjin, Zhimin
He, Jingyi
Liang, Yifei
NEUROCOMPUTING, 2024, 600
[46] GPONet: A two-stream gated progressive optimization network for salient object detection
Yi, Yugen
Zhang, Ningyi
Zhou, Wei
Shi, Yanjiao
Xie, Gengsheng
Wang, Jianzhong
PATTERN RECOGNITION, 2024, 150
[47] CoLA: Conditional Dropout and Language-Driven Robust Dual-Modal Salient Object Detection
Hao, Shuang
Zhong, Chunlin
Tang, He
COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 354 - 371
[48] Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection
Zhou, Xiaofei
Gao, Hanxiao
Yu, Longxuan
Yang, Defu
Zhang, Jiyong
ELECTRONICS, 2023, 12 (03)
[49] Two-stage local attention network for salient object detection in remote sensing images
Lin, Qihui
Xia, Lurui
Li, Sen
Chen, Wanfeng
IET IMAGE PROCESSING, 2023, 17 (03) : 849 - 861
[50] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
Gao, Lina
Fu, Ping
Xu, Mingzhu
Wang, Tiantian
Liu, Bing
VISUAL COMPUTER, 2024, 40 (03) : 1565 - 1582

← 1 2 3 4 5 →