ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引:1
作者
Qiu, Jiyuan [1 ]
Jiang, Chen [1 ]
Wang, Haowen [1 ]
机构
[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China
关键词
Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;
D O I
10.1109/LSP.2024.3465351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.
引用
收藏
页码:2930 / 2934
页数:5
相关论文
共 50 条
  • [31] Saliency Prototype for RGB-D and RGB-T Salient Object Detection
    Zhang, Zihao
    Wang, Jie
    Han, Yahong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3696 - 3705
  • [32] Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection
    Zhu, Jinchao
    Zhang, Xiaoyu
    Dong, Feng
    Yan, Siyu
    Meng, Xianbang
    Li, Yuehua
    Tan, Panlong
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1989 - 1994
  • [33] STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection
    Ma, Qianwen
    Li, Xiaobo
    Li, Bincheng
    Zhu, Zhen
    Wu, Jing
    Huang, Feng
    Hu, Haofeng
    INFORMATION FUSION, 2025, 122
  • [34] Aggregate interactive learning for RGB-D salient object detection
    Wu, Jingyu
    Sun, Fuming
    Xu, Rui
    Meng, Jie
    Wang, Fasheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
  • [35] ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection
    Zhou, Wujie
    Guo, Qinling
    Lei, Jingsheng
    Yu, Lu
    Hwang, Jenq-Neng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1224 - 1235
  • [36] Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
    Wang, Shuaihui
    Jiang, Fengyi
    Xu, Boqian
    SENSORS, 2023, 23 (21)
  • [37] RGB-D Salient Object Detection via Joint Learning and Multi-feature Fusion
    Chen, Peng
    Li, BenHang
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 547 - 552
  • [38] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
    Liu, Zhengyi
    Wang, Yuan
    Tu, Zhengzheng
    Xiao, Yun
    Tang, Bin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
  • [39] Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (11) : 4808 - 4820
  • [40] Masked Visual Pre-training for RGB-D and RGB-T Salient Object Detection
    Qi, Yanyu
    Guo, Ruohao
    Li, Zhenbo
    Niu, Dantong
    Qu, Liao
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 49 - 66