EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection

被引：20

作者：

Chen, Geng ^{[1
]}

Wang, Qingyue ^{[1
]}

Dong, Bo ^{[2
]}

Ma, Ruitao ^{[1
]}

Liu, Nian ^{[3
]}

Fu, Huazhu ^{[4
]}

Xia, Yong ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Natl Engn Lab Integrated Aero Space Ground Ocean, Xian 710072, Peoples R China

[2] Zhejiang Univ, Ctr Brain Imaging Sci & Technol, Hangzhou 310027, Peoples R China

[3] Mohamed Bin Zayed Univ Artificial Intelligence, Comp Vis Dept, Abu Dhabi, U Arab Emirates

[4] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Image edge detection; Feature extraction; Transformers; Object detection; Computational modeling; Task analysis; Decoding; Edge-aware model; multimodal learning; salient object detection (SOD); transformer; NETWORK; FUSION; ATTENTION; FEATURES;

D O I：

10.1109/TNNLS.2024.3358858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called EM-Trans, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low- and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that EM-Trans is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.

引用

页码：3175 / 3188

页数：14

共 116 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] Towards accurate RGB-D saliency detection with complementary attention and adaptive integration [J].

Bi, Hong-Bo ;

Liu, Zi-Qi ;

Wang, Kang ;

Dong, Bo ;

Chen, Geng ;

Ma, Ji-Quan .

NEUROCOMPUTING, 2021, 439 :63-74

[3] PSNet: Parallel symmetric network for RGB-T salient object detection [J].

Bi, Hongbo ;

Wu, Ranwan ;

Liu, Ziqi ;

Zhang, Jiayuan ;

Zhang, Cong ;

Xiang, Tian-Zhu ;

Wang, Xiufang .

NEUROCOMPUTING, 2022, 511 (410-425) :410-425

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection [J].

Chen, Gang ;

Shao, Feng ;

Chai, Xiongli ;

Chen, Hangwei ;

Jiang, Qiuping ;

Meng, Xiangchao ;

Ho, Yo-Sung .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) :1787-1801

[6] Fusion-Embedding Siamese Network for Light Field Salient Object Detection [J].

Chen, Geng ;

Fu, Huazhu ;

Zhou, Tao ;

Xiao, Guobao ;

Fu, Keren ;

Xia, Yong ;

Zhang, Yanning .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :984-994

[7] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[8] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060

[9] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].

Chen, Hao ;

Li, Youfu ;

Su, Dan .

PATTERN RECOGNITION, 2019, 86 :376-385

[10]

Chen Q., 2022, IEEE T NEURAL NETW L

← 1 2 3 4 5 6 7 8 9 10 →