Variational Context-Deformable ConvNets for Indoor Scene Parsing

被引:53
作者
Xiong, Zhitong
Yuan, Yuan [1 ]
Guo, Nianhui
Wang, Qi
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Shaanxi, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR42600.2020.00405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context information is critical for image semantic segmentation. Especially in indoor scenes, the large variation of object scales makes spatial-context an important factor for improving the segmentation performance. Thus, in this paper, we propose a novel variational context-deformable (VCD) module to learn adaptive receptive-field in a structured fashion. Different from standard ConvNets, which share fixed-size spatial context for all pixels, the VCD module learns a deformable spatial-context with the guidance of depth information: depth information provides clues for identifying real local neighborhoods. Specifically, adaptive Gaussian kernels are learned with the guidance of multimodal information. By multiplying the learned Gaussian kernel with standard convolution filters, the VCD module can aggregate flexible spatial context for each pixel during convolution. The main contributions of this work are as follows: 1) a novel VCD module is proposed, which exploits learnable Gaussian kernels to enable feature learning with structured adaptive-context; 2) variational Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal information for RGB-D segmentation. We evaluate the proposed approach on three widely-used datasets, and the performance improvement has shown the effectiveness of the proposed method.
引用
收藏
页码:3991 / 4001
页数:11
相关论文
共 53 条
[1]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.348
[2]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[3]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[4]   Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].
Cheng, Yanhua ;
Cai, Rui ;
Li, Zhiwei ;
Zhao, Xin ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483
[5]   SurfConv: Bridging 3D and 2D Convolution for RGBD Images [J].
Chu, Hang ;
Ma, Wei-Chiu ;
Kundu, Kaustav ;
Urtasun, Raquel ;
Fidler, Sanja .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3002-3011
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
Couprie C., 2013, INT C LEARN REPR ICL
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]   Fusion of Geometry and Color Information for Scene Segmentation [J].
Dal Mutto, Carlo ;
Zanuttigh, Pietro ;
Cortelazzo, Guido M. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (05) :505-521
[10]   Semantic Segmentation of RGBD Images with Mutex Constraints [J].
Deng, Zhuo ;
Todorovic, Sinisa ;
Latecki, Longin Jan .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1733-1741