Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

被引：0

作者：

Wei Li

Junhua Gu

Yongfeng Dong

Yao Dong

Jungong Han

机构：

[1] Hebei University of Technology,School of Electrical Engineering

[2] Hebei University of Technology,State Key Laboratory of Reliability and Intelligence of Electrical Equipment

[3] Hebei University of Technology,Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability of Hebei Province

[4] Hebei University of Technology,School of Artificial Intelligence

[5] Key Laboratory of Big Data Computing,School of Computing and Communications

[6] Hebei,undefined

[7] Lancaster University,undefined

来源：

Multimedia Tools and Applications | 2020年 / 79卷

关键词：

Sematic segmentation; CNNs; RGB-D; Fully-connected conditional random field;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

With the availability of low-cost depth-visual sensing devices, such as Microsoft Kinect, we are experiencing a growing interest in indoor environment understanding, at the core of which is semantic segmentation in RGB-D image. The latest research shows that the convolutional neural network (CNN) still dominates the image semantic segmentation field. However, down-sampling operated during the training process of CNNs leads to unclear segmentation boundaries and poor classification accuracy. To address this problem, in this paper, we propose a novel end-to-end deep architecture, termed FuseCRFNet, which seamlessly incorporates a fully-connected Conditional Random Fields (CRFs) model into a depth-based CNN framework. The proposed segmentation method uses the properties of pixel-to-pixel relationships to increase the accuracy of image semantic segmentation. More importantly, we formulate the CRF as one of the layers in FuseCRFNet to refine the coarse segmentation in the forward propagation, in meanwhile, it passes back the errors to facilitate the training. The performance of our FuseCRFNet is evaluated by experimenting with SUN RGB-D dataset, and the results show that the proposed algorithm is superior to existing semantic segmentation algorithms with an improvement in accuracy of at least 2%, further verifying the effectiveness of the algorithm.

引用

页码：35475 / 35489

页数：14

共 30 条

[1]

Alam FI(2017)Conditional random field and deep feature learning for hyperspectral image segmentation[J] IEEE Trans Geosci Remote Sens PP 99-848

[2]

Zhou J(2016)DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs[J] IEEE Trans Pattern Anal Mach Intell 40 834-263

[3]

Liew WC(2012)Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment[J] IEEE Trans Consum Electron 58 255-651

[4]

Chen LC(2014)Fully convolutional networks for semantic segmentation[J] IEEE Trans Pattern Anal Mach Intell 39 640-4366

[5]

Papandreou G(2018)Gabor convolutional networks[J] IEEE Trans Image Process 27 4357-536

[6]

Kokkinos I(1986)Learning representations by back-propagating errors[J] Nature 323 533-252

[7]

Han J(2014)ImageNet large scale visual recognition challenge[J] Int J Comput Vis 115 211-2007

[8]

Pauwels EJ(2018)GlanceNets — efficient convolutional neural networks with adaptive hard example mining[J] SCIENCE CHINA Inf Sci 61 1993-295

[9]

Zeeuw PMD(2019)Unsupervised deep video hashing via balanced code for large-scale video retrieval[J] IEEE Trans Image Process 28 284-3398

[10]

Long J(2017)Supervised hash coding with deep neural network for environment perception of intelligent vehicles[J] IEEE Trans Intell Transp Syst 19 3389-undefined

← 1 2 3 →