HierN et: Hierarchical Transformer U -Shape Network for RGB-D Salient Object Detection

被引:1
作者
Lv, Pengfei [1 ]
Yu, Xiaosheng [1 ]
Wang, Junxiang [1 ]
Wu, Chengdong [1 ]
机构
[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang, Peoples R China
来源
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC | 2023年
基金
中国国家自然科学基金;
关键词
salient object detection; RGB-D; transformer; self-attention;
D O I
10.1109/CCDC58219.2023.10327419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the popularity of depth sensors, research on RGB-D salient object detection (SOD) is also thriving. However, given the limitations of the external environment and the sensor itself, depth information is often less credible. To meet this challenge, existing models often purify the depth information using complex convolution and pooling operations. This causes a large amount of useful information besides noise to be dropped as well, and multi-modality interaction chances between RGB and depth become less. Also, with the gradual loss of information, the hidden relationship of features between mult-ilevel is thus ignored. To tackle the aforementioned problems, we propose a Hierarchical Transformer U-Shape Network (HierNet) that include three aspects: 1) With a simple structure, a depth calibration module provides faithful depth information with minimal loss of information, providing conditions for cross modality cross -layer information interaction; 2) With multi-head attention, a set of global view -based transformer encoders are employed to find the potential coherence between RGB and depth modalities. With weight sharing, several transformer encoder sets comprise the hierarchical transformer embedding module to search long-range dependencies cross -level; 3) Considering the complementary features of U -shape network, we use dual-stream U -shape network as our backbone. Extensive fair experiments on four challenging datasets have demonstrated the outstanding performance of the proposed model compared to state-of-the-art models.
引用
收藏
页码:1807 / 1811
页数:5
相关论文
共 31 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] An In Depth View of Saliency
    Ciptadi, Arridhana
    Hermans, Tucker
    Rehg, James M.
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [3] Dosovitskiy A., 2020, PREPRINT
  • [4] Structure-measure: A New Way to Evaluate Foreground Maps
    Fan, Deng-Ping
    Cheng, Ming-Ming
    Liu, Yun
    Li, Tao
    Borji, Ali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567
  • [5] Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698
  • [6] Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks
    Fan, Deng-Ping
    Lin, Zheng
    Zhang, Zhao
    Zhu, Menglong
    Cheng, Ming-Ming
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2075 - 2089
  • [7] Local Background Enclosure for RGB-D Salient Object Detection
    Feng, David
    Barnes, Nick
    You, Shaodi
    McCarthy, Chris
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2343 - 2350
  • [8] JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
    Fu, Keren
    Fan, Deng-Ping
    Ji, Ge-Peng
    Zhao, Qijun
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3049 - 3059
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Calibrated RGB-D Salient Object Detection
    Ji, Wei
    Li, Jingjing
    Yu, Shuang
    Zhang, Miao
    Piao, Yongri
    Yao, Shunyu
    Bi, Qi
    Ma, Kai
    Zheng, Yefeng
    Lu, Huchuan
    Cheng, Li
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9466 - 9476