ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引:1
作者
Qiu, Jiyuan [1 ]
Jiang, Chen [1 ]
Wang, Haowen [1 ]
机构
[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China
关键词
Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;
D O I
10.1109/LSP.2024.3465351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.
引用
收藏
页码:2930 / 2934
页数:5
相关论文
共 50 条
  • [21] Discriminative feature fusion for RGB-D salient object detection
    Chen, Zeyu
    Zhu, Mingyu
    Chen, Shuhan
    Lu, Lu
    Tang, Haonan
    Hu, Xuelong
    Ji, Chunfan
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [22] GroupTransNet: Group transformer network for RGB-D salient object detection
    Fang, Xian
    Jiang, Mingfeng
    Zhu, Jinchao
    Shao, Xiuli
    Wang, Hongpeng
    NEUROCOMPUTING, 2024, 594
  • [23] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [24] Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion
    Wang, Jie
    Song, Kechen
    Bao, Yanqi
    Yan, Yunhui
    Han, Yahong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [25] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [26] TANet: Transformer-based asymmetric network for RGB-D salient object detection
    Liu, Chang
    Yang, Gang
    Wang, Shuo
    Wang, Hangxu
    Zhang, Yunhua
    Wang, Yutao
    IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
  • [27] ICNet: Information Conversion Network for RGB-D Based Salient Object Detection
    Li, Gongyang
    Liu, Zhi
    Ling, Haibin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 4873 - 4884
  • [28] MobileSal: Extremely Efficient RGB-D Salient Object Detection
    Wu, Yu-Huan
    Liu, Yun
    Xu, Jun
    Bian, Jia-Wang
    Gu, Yu-Chao
    Cheng, Ming-Ming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10261 - 10269
  • [29] APNet: Adversarial Learning Assistance and Perceived Importance Fusion Network for All-Day RGB-T Salient Object Detection
    Zhou, Wujie
    Zhu, Yun
    Lei, Jingsheng
    Wan, Jian
    Yu, Lu
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (04): : 957 - 968
  • [30] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883