A survey on deep learning techniques for image and video semantic segmentation

被引:650
作者
Garcia-Garcia, Alberto [1 ]
Orts-Escolano, Sergio [1 ]
Oprea, Sergiu [1 ]
Villena-Martinez, Victor [1 ]
Martinez-Gonzalez, Pablo [1 ]
Garcia-Rodriguez, Jose [1 ]
机构
[1] Univ Alicante, Percept Lab 3D, Alicante, Spain
关键词
Semantic segmentation; Deep learning; Scene labeling; OBJECT CLASSES; RECOGNITION; NETWORKS; DATABASE; VISION;
D O I
10.1016/j.asoc.2018.05.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we formulate the semantic segmentation problem and define the terminology of this field as well as interesting background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and goals. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. We also devote a part of the paper to review common loss functions and error metrics for this problem. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 65
页数:25
相关论文
共 126 条
[1]   Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks [J].
Ahmed, Amr ;
Yu, Kai ;
Xu, Wei ;
Gong, Yihong ;
Xing, Eric .
COMPUTER VISION - ECCV 2008, PT III, PROCEEDINGS, 2008, 5304 :69-+
[2]  
Alvarez JM, 2012, LECT NOTES COMPUT SC, V7578, P376, DOI 10.1007/978-3-642-33786-4_28
[3]  
[Anonymous], P ACCV
[4]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298977
[5]  
[Anonymous], 2015, Multi-scale Convolutional Architecture for Semantic Segmentation
[6]  
[Anonymous], 2015, ARXIV151209194
[7]  
[Anonymous], CORR
[8]  
[Anonymous], 2013, Consumer Depth Cameras for Computer Vision, DOI DOI 10.1007/978-1-4471-4640-7_8
[9]  
[Anonymous], P INT C PATT REC ICP
[10]  
[Anonymous], BMVC 2012 23 BRIT MA