ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引：1

作者：

Qiu, Jiyuan ^{[1
]}

Jiang, Chen ^{[1
]}

Wang, Haowen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;

D O I：

10.1109/LSP.2024.3465351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.

引用

页码：2930 / 2934

页数：5

共 50 条

[31] Saliency Prototype for RGB-D and RGB-T Salient Object Detection
Zhang, Zihao
Wang, Jie
Han, Yahong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3696 - 3705
[32] Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection
Zhu, Jinchao
Zhang, Xiaoyu
Dong, Feng
Yan, Siyu
Meng, Xianbang
Li, Yuehua
Tan, Panlong
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1989 - 1994
[33] STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection
Ma, Qianwen
Li, Xiaobo
Li, Bincheng
Zhu, Zhen
Wu, Jing
Huang, Feng
Hu, Haofeng
INFORMATION FUSION, 2025, 122
[34] Aggregate interactive learning for RGB-D salient object detection
Wu, Jingyu
Sun, Fuming
Xu, Rui
Meng, Jie
Wang, Fasheng
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
[35] ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection
Zhou, Wujie
Guo, Qinling
Lei, Jingsheng
Yu, Lu
Hwang, Jenq-Neng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1224 - 1235
[36] Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
Wang, Shuaihui
Jiang, Fengyi
Xu, Boqian
SENSORS, 2023, 23 (21)
[37] RGB-D Salient Object Detection via Joint Learning and Multi-feature Fusion
Chen, Peng
Li, BenHang
2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 547 - 552
[38] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Liu, Zhengyi
Wang, Yuan
Tu, Zhengzheng
Xiao, Yun
Tang, Bin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
[39] Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
Su, Dan
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (11) : 4808 - 4820
[40] Masked Visual Pre-training for RGB-D and RGB-T Salient Object Detection
Qi, Yanyu
Guo, Ruohao
Li, Zhenbo
Niu, Dantong
Qu, Liao
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 49 - 66

← 1 2 3 4 5 →