ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引：1

作者：

Qiu, Jiyuan ^{[1
]}

Jiang, Chen ^{[1
]}

Wang, Haowen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;

D O I：

10.1109/LSP.2024.3465351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.

引用

页码：2930 / 2934

页数：5

共 50 条

[1] EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection
Chen, Geng
Wang, Qingyue
Dong, Bo
Ma, Ruitao
Liu, Nian
Fu, Huazhu
Xia, Yong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3175 - 3188
[2] Highly Efficient RGB-D Salient Object Detection With Adaptive Fusion and Attention Regulation
Gao, Haoran
Wang, Fasheng
Wang, Mengyin
Sun, Fuming
Li, Haojie
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3104 - 3118
[3] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
Chen, Hao
Shen, Feihong
Ding, Ding
Deng, Yongjian
Li, Chao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
[4] Transformer Fusion and Pixel-Level Contrastive Learning for RGB-D Salient Object Detection
Wu, Jiesheng
Hao, Fangwei
Liang, Weiyun
Xu, Jing
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1011 - 1026
[5] Revisiting Feature Fusion for RGB-T Salient Object Detection
Zhang, Qiang
Xiao, Tonglin
Huang, Nianchang
Zhang, Dingwen
Han, Jungong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818
[6] Transformer-based difference fusion network for RGB-D salient object detection
Cui, Zhi-Qiang
Wang, Feng
Feng, Zheng-Yong
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
[7] Self-Supervised Pretraining With Multimodality Representation Enhancement for Salient Object Detection in RGB-D Images
Gao, Lina
Liu, Bing
Fu, Ping
Xu, Mingzhu
Zhang, Yonggang
Huang, Yulong
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[8] Multiscale multilevel context and multimodal fusion for RGB-D salient object detection
Wu, Junwei
Zhou, Wujie
Luo, Ting
Yu, Lu
Lei, Jingsheng
SIGNAL PROCESSING, 2021, 178
[9] Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via Transformer
Wang, Yue
Zhang, Lu
Zhang, Pingping
Zhuge, Yunzhi
Wu, Junfeng
Yu, Hong
Lu, Huchuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11592 - 11604
[10] Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers
Zhang, Ni
Han, Junwei
Liu, Nian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4556 - 4570

← 1 2 3 4 5 →