Deep Learning Markov Random Field for Semantic Segmentation

被引:116
作者
Liu, Ziwei [1 ]
Li, Xiaoxiao [1 ]
Luo, Ping [1 ]
Loy, Chen Change [1 ]
Tang, Xiaoou [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Shatin, Hong Kong, Peoples R China
关键词
Semantic image/video segmentation; Markov random field; convolutional neural network; VIDEO; TRACKING;
D O I
10.1109/TPAMI.2017.2737535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation tasks can be well modeled by Markov Random Field (MRF). This paper addresses semantic segmentation by incorporating high-order relations and mixture of label contexts into MRF. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN to model unary terms and additional layers are devised to approximate the mean field (MF) algorithm for pairwise terms. It has several appealing properties. First, different from the recent works that required many iterations of MF during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing models as its special cases. Furthermore, pairwise terms in DPN provide a unified framework to encode rich contextual information in high-dimensional data, such as images and videos. Third, DPN makes MF easier to be parallelized and speeded up, thus enabling efficient inference. DPN is thoroughly evaluated on standard semantic image/video segmentation benchmarks, where a single DPN model yields state-of-the-art segmentation accuracies on PASCAL VOC 2012, Cityscapes dataset and CamVid dataset.
引用
收藏
页码:1814 / 1828
页数:15
相关论文
共 61 条
[51]  
Sun Y, 2014, ADV NEUR IN, V27
[52]  
Szummer M, 2008, LECT NOTES COMPUT SC, V5303, P582, DOI 10.1007/978-3-540-88688-4_43
[53]   DeepFace: Closing the Gap to Human-Level Performance in Face Verification [J].
Taigman, Yaniv ;
Yang, Ming ;
Ranzato, Marc'Aurelio ;
Wolf, Lior .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1701-1708
[54]  
Tighe J, 2010, LECT NOTES COMPUT SC, V6315, P352, DOI 10.1007/978-3-642-15555-0_26
[55]  
Tripathi S, 2015, INT SOC DESIGN CONF, P157, DOI 10.1109/ISOCC.2015.7401766
[56]  
Vineet Vibhav, 2013, Energy Minimization Methods in Computer Vision and Pattern Recognition. 9th International Conference, EMMCVPR 2013. Proceedings. LNCS 8081, P180, DOI 10.1007/978-3-642-40395-8_14
[57]  
Vineet V, 2012, LECT NOTES COMPUT SC, V7576, P31, DOI 10.1007/978-3-642-33715-4_3
[58]  
Wang CH, 2009, IEEE I CONF COMP VIS, P747
[59]   Context Driven Scene Parsing with Attention to Rare Classes [J].
Yang, Jimei ;
Price, Brian ;
Cohen, Scott ;
Yang, Ming-Hsuan .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3294-3301
[60]  
Yu FP, 2016, PROCEEDINGS OF 2016 SYMPOSIUM ON PIEZOELECTRICITY, ACOUSTIC WAVES, AND DEVICE APPLICATIONS (SPAWDA), P1, DOI 10.1109/SPAWDA.2016.7829944