ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection

被引：1

作者：

Qiu, Jiyuan ^{[1
]}

Jiang, Chen ^{[1
]}

Wang, Haowen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Training; Decoding; Computer architecture; Transformers; Representation learning; Object detection; Multimodal hybrid fusion; representation learning; RGB-D-T salient object detection; transformer; NETWORK;

D O I：

10.1109/LSP.2024.3465351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the susceptibility of depth and thermal images to environmental interferences, researchers began to combine three modalities for salient object detection (SOD). In this letter, we propose an efficient transformer network (ETFormer) based on multimodal hybrid fusion and representation learning for RGB-D-T SOD. First, unlike most works, we design a backbone to extract three modal information, and propose a multi-modal multi-head attention module (MMAM) for feature fusion, which improves network performance while reducing compute redundancy. Secondly, we reassembled a three-modal dataset called R-D-T ImageNet-1K to pretrain the network to solve the problem that other modalities are still using RGB modality during pretraining. Finally, through extensive experiments, our proposed method can combine the advantages of different modalities and achieve better performance compared to other existing methods.

引用

页码：2930 / 2934

页数：5

共 50 条

[41] Weighted Guided Optional Fusion Network for RGB-T Salient Object Detection
Wang, Jie
Li, Guoqiang
Shi, Jie
Xi, Jinwen
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
[42] Adaptive interactive network for RGB-T salient object detection with double mapping transformer
Dong, Feng
Wang, Yuxuan
Zhu, Jinchao
Li, Yuehua
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59169 - 59193
[43] EATNet: edge-aware and transformer-based network for RGB-D salient object detection
Xu Liu
Chenhua Liu
Xianye Zhou
Guodong Fan
Multimedia Systems, 2025, 31 (2)
[44] Discriminative unimodal feature selection and fusion for RGB-D salient object detection
Huang, Nianchang
Luo, Yongjiang
Zhang, Qiang
Han, Jungong
PATTERN RECOGNITION, 2022, 122
[45] Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection
Huo, Fushuo
Zhu, Xuegui
Zhang, Lei
Liu, Qifeng
Shu, Yu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3111 - 3124
[46] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
Luo, Jincheng
Li, Yongjun
Li, Bo
Zhang, Xinru
Li, Chaoyue
Chenjin, Zhimin
He, Jingyi
Liang, Yifei
NEUROCOMPUTING, 2024, 600
[47] RGB-D Point Cloud Registration Based on Salient Object Detection
Wan, Teng
Du, Shaoyi
Cui, Wenting
Yao, Runzhao
Ge, Yuyan
Li, Ce
Gao, Yue
Zheng, Nanning
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3547 - 3559
[48] ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection
Li, Chongyi
Cong, Runmin
Kwong, Sam
Hou, Junhui
Fu, Huazhu
Zhu, Guopu
Zhang, Dingwen
Huang, Qingming
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (01) : 88 - 100
[49] HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection
Luo, Yi
Shao, Feng
Xie, Zhengxuan
Wang, Huizhi
Chen, Hangwei
Mu, Baoyang
Jiang, Qiuping
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 15
[50] HierN et: Hierarchical Transformer U -Shape Network for RGB-D Salient Object Detection
Lv, Pengfei
Yu, Xiaosheng
Wang, Junxiang
Wu, Chengdong
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 1807 - 1811

← 1 2 3 4 5 →