ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引:1
作者
Qiu, Jiyuan [1 ]
Jiang, Chen [1 ]
Wang, Haowen [1 ]
机构
[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China
关键词
Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;
D O I
10.1109/LSP.2024.3465351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.
引用
收藏
页码:2930 / 2934
页数:5
相关论文
共 50 条
  • [41] Weighted Guided Optional Fusion Network for RGB-T Salient Object Detection
    Wang, Jie
    Li, Guoqiang
    Shi, Jie
    Xi, Jinwen
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [42] Adaptive interactive network for RGB-T salient object detection with double mapping transformer
    Dong, Feng
    Wang, Yuxuan
    Zhu, Jinchao
    Li, Yuehua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59169 - 59193
  • [43] EATNet: edge-aware and transformer-based network for RGB-D salient object detection
    Xu Liu
    Chenhua Liu
    Xianye Zhou
    Guodong Fan
    Multimedia Systems, 2025, 31 (2)
  • [44] Discriminative unimodal feature selection and fusion for RGB-D salient object detection
    Huang, Nianchang
    Luo, Yongjiang
    Zhang, Qiang
    Han, Jungong
    PATTERN RECOGNITION, 2022, 122
  • [45] Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection
    Huo, Fushuo
    Zhu, Xuegui
    Zhang, Lei
    Liu, Qifeng
    Shu, Yu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3111 - 3124
  • [46] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    NEUROCOMPUTING, 2024, 600
  • [47] RGB-D Point Cloud Registration Based on Salient Object Detection
    Wan, Teng
    Du, Shaoyi
    Cui, Wenting
    Yao, Runzhao
    Ge, Yuyan
    Li, Ce
    Gao, Yue
    Zheng, Nanning
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3547 - 3559
  • [48] ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection
    Li, Chongyi
    Cong, Runmin
    Kwong, Sam
    Hou, Junhui
    Fu, Huazhu
    Zhu, Guopu
    Zhang, Dingwen
    Huang, Qingming
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (01) : 88 - 100
  • [49] HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection
    Luo, Yi
    Shao, Feng
    Xie, Zhengxuan
    Wang, Huizhi
    Chen, Hangwei
    Mu, Baoyang
    Jiang, Qiuping
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 15
  • [50] HierN et: Hierarchical Transformer U -Shape Network for RGB-D Salient Object Detection
    Lv, Pengfei
    Yu, Xiaosheng
    Wang, Junxiang
    Wu, Chengdong
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 1807 - 1811