Semantic-spatial fusion network for human parsing

被引：14

作者：

Zhang, Xiaomei ^{[1
,2
]}

Chen, Yingying ^{[1
,2
]}

Zhu, Bingke ^{[1
,2
]}

Wang, Jinqiao ^{[1
,2
]}

Tang, Ming ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

NEUROCOMPUTING | 2020年 / 402卷

基金：

中国国家自然科学基金;

关键词：

SSFNet; Semantic modulation model; Resolution-aware model; Human parsing; SEGMENTATION; MODELS;

D O I：

10.1016/j.neucom.2020.03.096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, many methods have united low-level and high-level features to generate the desired accurate high-resolution prediction for human parsing. Nevertheless, there exists a semantic-spatial gap between low-level and high-level features in some methods, i.e., high-level features represent more semantics and less spatial details, while low-level ones have less semantics and more spatial details. In this paper, we propose a Semantic-Spatial Fusion Network (SSFNet) for human parsing to shrink the gap, which generates the accurate high-resolution prediction by aggregating multi-resolution features. SSFNet includes two models, a semantic modulation model and a resolution-aware model. The semantic modulation model guides spatial details with semantics and then effectively facilitates the feature fusion, narrowing the gap. The resolution-aware model sufficiently boosts the feature fusion and obtains multi-receptive-fields, which generates reliable and fine-grained high-resolution features for each branch, in bottom-up and top-down processes. Extensive experiments on three public datasets, PASCAL-Person-Part, LIP and PPSS, show that SSFNet achieves significant improvements over state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：375 / 383

页数：9

共 44 条

[21] Semantic Object Parsing with Graph LSTM
Liang, Xiaodan
Shen, Xiaohui
Feng, Jiashi
Lin, Liang
Yan, Shuicheng
[J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 125 - 143
[22] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
Lin, Guosheng
Milan, Anton
Shen, Chunhua
Reid, Ian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5168 - 5177
[23] Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
Lin, Guosheng
Shen, Chunhua
van den Hengel, Anton
Reid, Ian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3194 - 3203
[24] Feature Pyramid Networks for Object Detection
Lin, Tsung-Yi
Dollar, Piotr
Girshick, Ross
He, Kaiming
Hariharan, Bharath
Belongie, Serge
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944
[25] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[26] Pedestrian Parsing via Deep Decompositional Network
Luo, Ping
Wang, Xiaogang
Tang, Xiaoou
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2648 - 2655
[27] Macro-Micro Adversarial Network for Human Parsing
Luo, Yawei
Zheng, Zhedong
Zheng, Liang
Guan, Tao
Yu, Junqing
Yang, Yi
[J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 424 - 440
[28] Stacked Hourglass Networks for Human Pose Estimation
Newell, Alejandro
Yang, Kaiyu
Deng, Jia
[J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 483 - 499
[29] Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation
Nie, Xuecheng
Feng, Jiashi
Yan, Shuicheng
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 519 - 534
[30] Pose Partition Networks for Multi-person Pose Estimation
Nie, Xuecheng
Feng, Jiashi
Xing, Junliang
Yan, Shuicheng
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 705 - 720

← 1 2 3 4 5 →