ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引:1
作者
Qiu, Jiyuan [1 ]
Jiang, Chen [1 ]
Wang, Haowen [1 ]
机构
[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China
关键词
Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;
D O I
10.1109/LSP.2024.3465351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.
引用
收藏
页码:2930 / 2934
页数:5
相关论文
共 50 条
  • [1] EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection
    Chen, Geng
    Wang, Qingyue
    Dong, Bo
    Ma, Ruitao
    Liu, Nian
    Fu, Huazhu
    Xia, Yong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3175 - 3188
  • [2] Highly Efficient RGB-D Salient Object Detection With Adaptive Fusion and Attention Regulation
    Gao, Haoran
    Wang, Fasheng
    Wang, Mengyin
    Sun, Fuming
    Li, Haojie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3104 - 3118
  • [3] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
    Chen, Hao
    Shen, Feihong
    Ding, Ding
    Deng, Yongjian
    Li, Chao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
  • [4] Transformer Fusion and Pixel-Level Contrastive Learning for RGB-D Salient Object Detection
    Wu, Jiesheng
    Hao, Fangwei
    Liang, Weiyun
    Xu, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1011 - 1026
  • [5] Revisiting Feature Fusion for RGB-T Salient Object Detection
    Zhang, Qiang
    Xiao, Tonglin
    Huang, Nianchang
    Zhang, Dingwen
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818
  • [6] Transformer-based difference fusion network for RGB-D salient object detection
    Cui, Zhi-Qiang
    Wang, Feng
    Feng, Zheng-Yong
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [7] Self-Supervised Pretraining With Multimodality Representation Enhancement for Salient Object Detection in RGB-D Images
    Gao, Lina
    Liu, Bing
    Fu, Ping
    Xu, Mingzhu
    Zhang, Yonggang
    Huang, Yulong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [8] Multiscale multilevel context and multimodal fusion for RGB-D salient object detection
    Wu, Junwei
    Zhou, Wujie
    Luo, Ting
    Yu, Lu
    Lei, Jingsheng
    SIGNAL PROCESSING, 2021, 178
  • [9] Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via Transformer
    Wang, Yue
    Zhang, Lu
    Zhang, Pingping
    Zhuge, Yunzhi
    Wu, Junfeng
    Yu, Hong
    Lu, Huchuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11592 - 11604
  • [10] Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers
    Zhang, Ni
    Han, Junwei
    Liu, Nian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4556 - 4570