FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

被引:577
作者
Hazirbas, Caner [1 ]
Ma, Lingni [1 ]
Domokos, Csaba [1 ]
Cremers, Daniel [1 ]
机构
[1] Tech Univ Munich, Munich, Germany
来源
COMPUTER VISION - ACCV 2016, PT I | 2017年 / 10111卷
关键词
D O I
10.1007/978-3-319-54181-5_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of semantic labeling of indoor scenes on RGB-D data. With the availability of RGB-D cameras, it is expected that additional depth measurement will improve the accuracy. Here we investigate a solution how to incorporate complementary depth information into a semantic segmentation framework by making use of convolutional neural networks (CNNs). Recently encoder-decoder type fully convolutional CNN architectures have achieved a great success in the field of semantic segmentation. Motivated by this observation we propose an encoder-decoder type network, where the encoder part is composed of two branches of networks that simultaneously extract features from RGB and depth images and fuse depth features into the RGB feature maps as the network goes deeper. Comprehensive experimental evaluations demonstrate that the proposed fusion-based architecture achieves competitive results with the state-of-the-art methods on the challenging SUN RGB-D benchmark obtaining 76.27% global accuracy, 48.30% average class accuracy and 37.29% average intersection-over-union score.
引用
收藏
页码:213 / 228
页数:16
相关论文
共 22 条
[1]  
[Anonymous], 2015, P 3 INT C LEARN REPR
[2]  
[Anonymous], 2015, P IEEE INT C COMP VI
[3]  
[Anonymous], ARXIV14085093 ARXIV
[4]  
[Anonymous], 2015, P INT C LEARN REPR S
[5]  
Badrinarayanan V., 2015, ARXIV150507293 ARXIV
[6]  
Bottou Leon, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P421, DOI 10.1007/978-3-642-35289-8_25
[7]  
Byeon W, 2015, PROC CVPR IEEE, P3547, DOI 10.1109/CVPR.2015.7298977
[8]  
Couprie C., 2013, P INT C LEARN REP
[9]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[10]  
Gal Y., 2015, COMPUTING RES REPOSI