Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

被引:294
作者
Chen, Xiaokang [1 ]
Lin, Kwan-Yee [2 ]
Wang, Jingbo [3 ]
Wu, Wayne [2 ]
Qian, Chen [2 ]
Li, Hongsheng [3 ]
Zeng, Gang [1 ]
机构
[1] Peking Univ, Key Lab Machine Percept, MOE, Sch EECS, Beijing, Peoples R China
[2] SenseTime Res, Tai Po, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XI | 2020年 / 12356卷
基金
中国国家自然科学基金;
关键词
RGB-D semantic segmentation; Cross-modality feature propagation;
D O I
10.1007/978-3-030-58621-8_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets (Code of this work is available at https://charlescxk.github.io/).
引用
收藏
页码:561 / 577
页数:17
相关论文
共 42 条
[1]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[2]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[3]   Salience-Guided Cascaded Suppression Network for Person Re-identification [J].
Chen, Xuesong ;
Fu, Canmiao ;
Zhao, Yong ;
Zheng, Feng ;
Song, Jingkuan ;
Ji, Rongrong ;
Yang, Yi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3297-3307
[4]   3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation [J].
Chen, Yunlu ;
Mensink, Thomas ;
Gavves, Efstratios .
2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, :173-182
[5]   SPGNet: Semantic Prediction Guidance for Scene Parsing [J].
Cheng, Bowen ;
Chen, Liang-Chieh ;
Wei, Yunchao ;
Zhu, Yukun ;
Huang, Zilong ;
Xiong, Jinjun ;
Huang, Thomas S. ;
Hwu, Wen-Mei ;
Shi, Honghui .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5217-5227
[6]   Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].
Cheng, Yanhua ;
Cai, Rui ;
Li, Zhiwei ;
Zhao, Xin ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483
[7]   Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks [J].
Choi, Sungha ;
Kim, Joanne T. ;
Choo, Jaegul .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9370-9380
[8]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[9]  
Deng L, 2019, arXiv
[10]   Boundary-Aware Feature Propagation for Scene Segmentation [J].
Ding, Henghui ;
Jiang, Xudong ;
Liu, Ai Qun ;
Thalmann, Nadia Magnenat ;
Wang, Gang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6818-6828