HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

被引:57
|
作者
Tang, Bin [1 ]
Liu, Zhengyi [2 ]
Tan, Yacheng [2 ]
He, Qian [2 ]
机构
[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China
[2] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China
关键词
Task analysis; Convolution; Transformers; Object detection; Feature extraction; Convolutional neural networks; Streaming media; HRFormer; salient object detection; cross modality; RGB-D; RGB-T; light field; RGB-D IMAGE; NETWORK; FUSION;
D O I
10.1109/TCSVT.2022.3202563
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The High-Resolution Transformer (HRFormer) can maintain high-resolution representation and share global receptive fields. It is friendly towards salient object detection (SOD) in which the input and output have the same resolution. However, two critical problems need to be solved for two-modality SOD. One problem is two-modality fusion. The other problem is the HRFormer output's fusion. To address the first problem, a supplementary modality is injected into the primary modality by using global optimization and an attention mechanism to select and purify the modality at the input level. To solve the second problem, a dual-direction short connection fusion module is used to optimize the output features of HRFormer, thereby enhancing the detailed representation of objects at the output level. The proposed model, named HRTransNet, first introduces an auxiliary stream for feature extraction of supplementary modality. Then, features are injected into the primary modality at the beginning of each multi-resolution branch. Next, HRFormer is applied to achieve forwarding propagation. Finally, all the output features with different resolutions are aggregated by intra-feature and inter-feature interactive transformers. Application of the proposed model results in impressive improvement for driving two-modality SOD tasks, e.g., RGB-D, RGB-T, and light field SOD.https://github.com/liuzywen/HRTransNet
引用
收藏
页码:728 / 742
页数:15
相关论文
共 50 条
  • [41] Attention-guided Multi-modality Interaction Network for RGB-D Salient Object Detection
    Wang, Ruimin
    Wang, Fasheng
    Su, Yiming
    Sun, Jing
    Sun, Fuming
    Li, Haojie
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (03)
  • [42] Intra-Modality Self-Enhancement Mirror Network for RGB-T Salient Object Detection
    Wang, Jie
    Li, Guoqiang
    Yu, Hongjie
    Xi, Jinwen
    Shi, Jie
    Wu, Xueying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2513 - 2525
  • [43] EFGNet: Encoder steered multi-modality feature guidance network for RGB-D salient object detection
    Xia, Chenxing
    Duan, Songsong
    Fang, Xianjin
    Gao, Xiuju
    Sun, Yanguang
    Ge, Bin
    Zhang, Hanling
    Li, Kuan-Ching
    DIGITAL SIGNAL PROCESSING, 2022, 131
  • [44] FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection
    Jin, Xiao
    Guo, Chunle
    He, Zhen
    Xu, Jing
    Wang, Yongwei
    Su, Yuting
    NEUROCOMPUTING, 2022, 491 : 414 - 425
  • [45] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    NEUROCOMPUTING, 2024, 600
  • [46] GPONet: A two-stream gated progressive optimization network for salient object detection
    Yi, Yugen
    Zhang, Ningyi
    Zhou, Wei
    Shi, Yanjiao
    Xie, Gengsheng
    Wang, Jianzhong
    PATTERN RECOGNITION, 2024, 150
  • [47] CoLA: Conditional Dropout and Language-Driven Robust Dual-Modal Salient Object Detection
    Hao, Shuang
    Zhong, Chunlin
    Tang, He
    COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 354 - 371
  • [48] Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection
    Zhou, Xiaofei
    Gao, Hanxiao
    Yu, Longxuan
    Yang, Defu
    Zhang, Jiyong
    ELECTRONICS, 2023, 12 (03)
  • [49] Two-stage local attention network for salient object detection in remote sensing images
    Lin, Qihui
    Xia, Lurui
    Li, Sen
    Chen, Wanfeng
    IET IMAGE PROCESSING, 2023, 17 (03) : 849 - 861
  • [50] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Gao, Lina
    Fu, Ping
    Xu, Mingzhu
    Wang, Tiantian
    Liu, Bing
    VISUAL COMPUTER, 2024, 40 (03) : 1565 - 1582