Semantic-spatial fusion network for human parsing

被引:14
作者
Zhang, Xiaomei [1 ,2 ]
Chen, Yingying [1 ,2 ]
Zhu, Bingke [1 ,2 ]
Wang, Jinqiao [1 ,2 ]
Tang, Ming [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
SSFNet; Semantic modulation model; Resolution-aware model; Human parsing; SEGMENTATION; MODELS;
D O I
10.1016/j.neucom.2020.03.096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, many methods have united low-level and high-level features to generate the desired accurate high-resolution prediction for human parsing. Nevertheless, there exists a semantic-spatial gap between low-level and high-level features in some methods, i.e., high-level features represent more semantics and less spatial details, while low-level ones have less semantics and more spatial details. In this paper, we propose a Semantic-Spatial Fusion Network (SSFNet) for human parsing to shrink the gap, which generates the accurate high-resolution prediction by aggregating multi-resolution features. SSFNet includes two models, a semantic modulation model and a resolution-aware model. The semantic modulation model guides spatial details with semantics and then effectively facilitates the feature fusion, narrowing the gap. The resolution-aware model sufficiently boosts the feature fusion and obtains multi-receptive-fields, which generates reliable and fine-grained high-resolution features for each branch, in bottom-up and top-down processes. Extensive experiments on three public datasets, PASCAL-Person-Part, LIP and PPSS, show that SSFNet achieves significant improvements over state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:375 / 383
页数:9
相关论文
共 44 条
  • [21] Semantic Object Parsing with Graph LSTM
    Liang, Xiaodan
    Shen, Xiaohui
    Feng, Jiashi
    Lin, Liang
    Yan, Shuicheng
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 125 - 143
  • [22] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
    Lin, Guosheng
    Milan, Anton
    Shen, Chunhua
    Reid, Ian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5168 - 5177
  • [23] Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
    Lin, Guosheng
    Shen, Chunhua
    van den Hengel, Anton
    Reid, Ian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3194 - 3203
  • [24] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944
  • [25] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
  • [26] Pedestrian Parsing via Deep Decompositional Network
    Luo, Ping
    Wang, Xiaogang
    Tang, Xiaoou
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2648 - 2655
  • [27] Macro-Micro Adversarial Network for Human Parsing
    Luo, Yawei
    Zheng, Zhedong
    Zheng, Liang
    Guan, Tao
    Yu, Junqing
    Yang, Yi
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 424 - 440
  • [28] Stacked Hourglass Networks for Human Pose Estimation
    Newell, Alejandro
    Yang, Kaiyu
    Deng, Jia
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 483 - 499
  • [29] Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation
    Nie, Xuecheng
    Feng, Jiashi
    Yan, Shuicheng
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 519 - 534
  • [30] Pose Partition Networks for Multi-person Pose Estimation
    Nie, Xuecheng
    Feng, Jiashi
    Xing, Junliang
    Yan, Shuicheng
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 705 - 720