Semantic-spatial fusion network for human parsing

被引：14

作者：

Zhang, Xiaomei ^{[1
,2
]}

Chen, Yingying ^{[1
,2
]}

Zhu, Bingke ^{[1
,2
]}

Wang, Jinqiao ^{[1
,2
]}

Tang, Ming ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

NEUROCOMPUTING | 2020年 / 402卷

基金：

中国国家自然科学基金;

关键词：

SSFNet; Semantic modulation model; Resolution-aware model; Human parsing; SEGMENTATION; MODELS;

D O I：

10.1016/j.neucom.2020.03.096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, many methods have united low-level and high-level features to generate the desired accurate high-resolution prediction for human parsing. Nevertheless, there exists a semantic-spatial gap between low-level and high-level features in some methods, i.e., high-level features represent more semantics and less spatial details, while low-level ones have less semantics and more spatial details. In this paper, we propose a Semantic-Spatial Fusion Network (SSFNet) for human parsing to shrink the gap, which generates the accurate high-resolution prediction by aggregating multi-resolution features. SSFNet includes two models, a semantic modulation model and a resolution-aware model. The semantic modulation model guides spatial details with semantics and then effectively facilitates the feature fusion, narrowing the gap. The resolution-aware model sufficiently boosts the feature fusion and obtains multi-receptive-fields, which generates reliable and fine-grained high-resolution features for each branch, in bottom-up and top-down processes. Extensive experiments on three public datasets, PASCAL-Person-Part, LIP and PPSS, show that SSFNet achieves significant improvements over state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：375 / 383

页数：9

共 44 条

[1] [Anonymous], 2016, P INT C NEUR INF PRO
[2] [Anonymous], 2018, PROC CVPR IEEE, DOI [DOI 10.1109/CVPR.2018.00745, DOI 10.1109/TPAMI.2019.2913372]
[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[4] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[6] Chen LW, 2016, IEEE CONF COMPUT
[7] Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts
Chen, Xianjie
Mottaghi, Roozbeh
Liu, Xiaobai
Fidler, Sanja
Urtasun, Raquel
Yuille, Alan
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1979 - 1986
[8] Leveraging semantic segmentation with learning-based confidence measure
Cheng, Feiyang
Zhang, Hong
Yuan, Ding
Sun, Mingui
[J]. NEUROCOMPUTING, 2019, 329 : 21 - 31
[9] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[10] Person Re-Identification by Symmetry-Driven Accumulation of Local Features
Farenzena, M.
Bazzani, L.
Perina, A.
Murino, V.
Cristani, M.
[J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2360 - 2367

← 1 2 3 4 5 →