HierN et: Hierarchical Transformer U -Shape Network for RGB-D Salient Object Detection

被引：1

作者：

Lv, Pengfei ^{[1
]}

Yu, Xiaosheng ^{[1
]}

Wang, Junxiang ^{[1
]}

Wu, Chengdong ^{[1
]}

机构：

[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang, Peoples R China

来源：

2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC | 2023年

基金：

中国国家自然科学基金;

关键词：

salient object detection; RGB-D; transformer; self-attention;

D O I：

10.1109/CCDC58219.2023.10327419

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the popularity of depth sensors, research on RGB-D salient object detection (SOD) is also thriving. However, given the limitations of the external environment and the sensor itself, depth information is often less credible. To meet this challenge, existing models often purify the depth information using complex convolution and pooling operations. This causes a large amount of useful information besides noise to be dropped as well, and multi-modality interaction chances between RGB and depth become less. Also, with the gradual loss of information, the hidden relationship of features between mult-ilevel is thus ignored. To tackle the aforementioned problems, we propose a Hierarchical Transformer U-Shape Network (HierNet) that include three aspects: 1) With a simple structure, a depth calibration module provides faithful depth information with minimal loss of information, providing conditions for cross modality cross -layer information interaction; 2) With multi-head attention, a set of global view -based transformer encoders are employed to find the potential coherence between RGB and depth modalities. With weight sharing, several transformer encoder sets comprise the hierarchical transformer embedding module to search long-range dependencies cross -level; 3) Considering the complementary features of U -shape network, we use dual-stream U -shape network as our backbone. Extensive fair experiments on four challenging datasets have demonstrated the outstanding performance of the proposed model compared to state-of-the-art models.

引用

页码：1807 / 1811

页数：5

共 31 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] An In Depth View of Saliency
Ciptadi, Arridhana
Hermans, Tucker
Rehg, James M.
[J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[3] Dosovitskiy A., 2020, PREPRINT
[4] Structure-measure: A New Way to Evaluate Foreground Maps
Fan, Deng-Ping
Cheng, Ming-Ming
Liu, Yun
Li, Tao
Borji, Ali
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567
[5] Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698
[6] Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks
Fan, Deng-Ping
Lin, Zheng
Zhang, Zhao
Zhu, Menglong
Cheng, Ming-Ming
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2075 - 2089
[7] Local Background Enclosure for RGB-D Salient Object Detection
Feng, David
Barnes, Nick
You, Shaodi
McCarthy, Chris
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2343 - 2350
[8] JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
Fu, Keren
Fan, Deng-Ping
Ji, Ge-Peng
Zhao, Qijun
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3049 - 3059
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Calibrated RGB-D Salient Object Detection
Ji, Wei
Li, Jingjing
Yu, Shuang
Zhang, Miao
Piao, Yongri
Yao, Shunyu
Bi, Qi
Ma, Kai
Zheng, Yefeng
Lu, Huchuan
Cheng, Li
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9466 - 9476

← 1 2 3 4 →