ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引：1

作者：

Qiu, Jiyuan ^{[1
]}

Jiang, Chen ^{[1
]}

Wang, Haowen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;

D O I：

10.1109/LSP.2024.3465351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.

引用

页码：2930 / 2934

页数：5

共 50 条

[21] Discriminative feature fusion for RGB-D salient object detection
Chen, Zeyu
Zhu, Mingyu
Chen, Shuhan
Lu, Lu
Tang, Haonan
Hu, Xuelong
Ji, Chunfan
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
[22] GroupTransNet: Group transformer network for RGB-D salient object detection
Fang, Xian
Jiang, Mingfeng
Zhu, Jinchao
Shao, Xiuli
Wang, Hongpeng
NEUROCOMPUTING, 2024, 594
[23] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
Alaba, Simegnew Yihunie
Ball, John E.
IEEE ACCESS, 2024, 12 : 50165 - 50176
[24] Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion
Wang, Jie
Song, Kechen
Bao, Yanqi
Yan, Yunhui
Han, Yahong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
[25] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Song, Peipei
Zhang, Jing
Koniusz, Piotr
Barnes, Nick
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
[26] TANet: Transformer-based asymmetric network for RGB-D salient object detection
Liu, Chang
Yang, Gang
Wang, Shuo
Wang, Hangxu
Zhang, Yunhua
Wang, Yutao
IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
[27] ICNet: Information Conversion Network for RGB-D Based Salient Object Detection
Li, Gongyang
Liu, Zhi
Ling, Haibin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 4873 - 4884
[28] MobileSal: Extremely Efficient RGB-D Salient Object Detection
Wu, Yu-Huan
Liu, Yun
Xu, Jun
Bian, Jia-Wang
Gu, Yu-Chao
Cheng, Ming-Ming
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10261 - 10269
[29] APNet: Adversarial Learning Assistance and Perceived Importance Fusion Network for All-Day RGB-T Salient Object Detection
Zhou, Wujie
Zhu, Yun
Lei, Jingsheng
Wan, Jian
Yu, Lu
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (04): : 957 - 968
[30] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
Xiao, Yun
Huang, Yameng
Li, Chenglong
Liu, Lei
Zhou, Aiwu
Tang, Jin
COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883

← 1 2 3 4 5 →