Two-Stage Cascaded Decoder for Semantic Segmentation of RGB-D Images

被引：26

作者：

Yue, Yuchun ^{[1
]}

Zhou, Wujie ^{[1
]}

Lei, Jingsheng ^{[1
]}

Yu, Lu ^{[2
]}

机构：

[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China

[2] Zhejiang Univ, Coll Informat & Elect Engn, Hangzhou 310027, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷 / 28期

基金：

中国国家自然科学基金;

关键词：

Semantics; Image segmentation; Feature extraction; Decoding; Sun; Computer architecture; Training; Deep learning; RGB-d image; semantic segmentation; multimodal feature extraction; multilevel feature fusion;

D O I：

10.1109/LSP.2021.3084855

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Exploiting RGB and depth information can boost the performance of semantic segmentation. However, owing to the differences between RGB images and the corresponding depth maps, such multimodal information should be effectively used and combined. Most existing methods use the same fusion strategy to explore multilevel complementary information at various levels, likely ignoring different feature contributions at various levels for segmentation. To address this problem, we propose a network using a two-stage cascaded decoder (TCD), embedding a detail polishing module, to effectively integrate high- and low-level features and suppress noise from low-level details. Additionally, we introduce a depth filter and fusion module to extract informative regions from depth cues with the guidance of RGB images. The proposed TCD network achieves comparable performance to state-of-the-art RGB-D semantic segmentation methods on the benchmark NYUDv2 and SUN RGB-D datasets.

引用

页码：1115 / 1119

页数：5

共 27 条

[1] [Anonymous], 2009, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2009.5206848
[2] Chen X., 2020, P EUR C COMP VI, P561
[3] He Kaiming, 2015, C COMP VIS PATT REC
[4] Hu XX, 2019, IEEE IMAGE PROC, P1440, DOI [10.1109/ICIP.2019.8803025, 10.1109/icip.2019.8803025]
[5] Jiang Jindong, 2018, Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation, P6
[6] Recurrent Scene Parsing with Perspective Understanding in the Loop
Kong, Shu
Fowlkes, Charless
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 956 - 965
[7] SCN: Switchable Context Network for Semantic Segmentation of RGB-D Images
Lin, Di
Zhang, Ruimao
Ji, Yuanfeng
Li, Ping
Huang, Hui
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (03) : 1120 - 1131
[8] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
Lin, Guosheng
Milan, Anton
Shen, Chunhua
Reid, Ian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5168 - 5177
[9] Focal Loss for Dense Object Detection
Lin, Tsung-Yi
Goyal, Priya
Girshick, Ross
He, Kaiming
Dollar, Piotr
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 318 - 327
[10] RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation
Park, Seong-Jin
Hong, Ki-Sang
Lee, Seungyong
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4990 - 4999

← 1 2 3 →