Two-Stage Cascaded Decoder for Semantic Segmentation of RGB-D Images

被引:26
作者
Yue, Yuchun [1 ]
Zhou, Wujie [1 ]
Lei, Jingsheng [1 ]
Yu, Lu [2 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ, Coll Informat & Elect Engn, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Image segmentation; Feature extraction; Decoding; Sun; Computer architecture; Training; Deep learning; RGB-d image; semantic segmentation; multimodal feature extraction; multilevel feature fusion;
D O I
10.1109/LSP.2021.3084855
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Exploiting RGB and depth information can boost the performance of semantic segmentation. However, owing to the differences between RGB images and the corresponding depth maps, such multimodal information should be effectively used and combined. Most existing methods use the same fusion strategy to explore multilevel complementary information at various levels, likely ignoring different feature contributions at various levels for segmentation. To address this problem, we propose a network using a two-stage cascaded decoder (TCD), embedding a detail polishing module, to effectively integrate high- and low-level features and suppress noise from low-level details. Additionally, we introduce a depth filter and fusion module to extract informative regions from depth cues with the guidance of RGB images. The proposed TCD network achieves comparable performance to state-of-the-art RGB-D semantic segmentation methods on the benchmark NYUDv2 and SUN RGB-D datasets.
引用
收藏
页码:1115 / 1119
页数:5
相关论文
共 27 条
  • [1] [Anonymous], 2009, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2009.5206848
  • [2] Chen X., 2020, P EUR C COMP VI, P561
  • [3] He Kaiming, 2015, C COMP VIS PATT REC
  • [4] Hu XX, 2019, IEEE IMAGE PROC, P1440, DOI [10.1109/ICIP.2019.8803025, 10.1109/icip.2019.8803025]
  • [5] Jiang Jindong, 2018, Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation, P6
  • [6] Recurrent Scene Parsing with Perspective Understanding in the Loop
    Kong, Shu
    Fowlkes, Charless
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 956 - 965
  • [7] SCN: Switchable Context Network for Semantic Segmentation of RGB-D Images
    Lin, Di
    Zhang, Ruimao
    Ji, Yuanfeng
    Li, Ping
    Huang, Hui
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (03) : 1120 - 1131
  • [8] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
    Lin, Guosheng
    Milan, Anton
    Shen, Chunhua
    Reid, Ian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5168 - 5177
  • [9] Focal Loss for Dense Object Detection
    Lin, Tsung-Yi
    Goyal, Priya
    Girshick, Ross
    He, Kaiming
    Dollar, Piotr
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 318 - 327
  • [10] RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation
    Park, Seong-Jin
    Hong, Ki-Sang
    Lee, Seungyong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4990 - 4999