Semantic-spatial fusion network for human parsing

被引:14
作者
Zhang, Xiaomei [1 ,2 ]
Chen, Yingying [1 ,2 ]
Zhu, Bingke [1 ,2 ]
Wang, Jinqiao [1 ,2 ]
Tang, Ming [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
SSFNet; Semantic modulation model; Resolution-aware model; Human parsing; SEGMENTATION; MODELS;
D O I
10.1016/j.neucom.2020.03.096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, many methods have united low-level and high-level features to generate the desired accurate high-resolution prediction for human parsing. Nevertheless, there exists a semantic-spatial gap between low-level and high-level features in some methods, i.e., high-level features represent more semantics and less spatial details, while low-level ones have less semantics and more spatial details. In this paper, we propose a Semantic-Spatial Fusion Network (SSFNet) for human parsing to shrink the gap, which generates the accurate high-resolution prediction by aggregating multi-resolution features. SSFNet includes two models, a semantic modulation model and a resolution-aware model. The semantic modulation model guides spatial details with semantics and then effectively facilitates the feature fusion, narrowing the gap. The resolution-aware model sufficiently boosts the feature fusion and obtains multi-receptive-fields, which generates reliable and fine-grained high-resolution features for each branch, in bottom-up and top-down processes. Extensive experiments on three public datasets, PASCAL-Person-Part, LIP and PPSS, show that SSFNet achieves significant improvements over state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:375 / 383
页数:9
相关论文
共 44 条
  • [1] [Anonymous], 2016, P INT C NEUR INF PRO
  • [2] [Anonymous], 2018, PROC CVPR IEEE, DOI [DOI 10.1109/CVPR.2018.00745, DOI 10.1109/TPAMI.2019.2913372]
  • [3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [4] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [6] Chen LW, 2016, IEEE CONF COMPUT
  • [7] Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts
    Chen, Xianjie
    Mottaghi, Roozbeh
    Liu, Xiaobai
    Fidler, Sanja
    Urtasun, Raquel
    Yuille, Alan
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1979 - 1986
  • [8] Leveraging semantic segmentation with learning-based confidence measure
    Cheng, Feiyang
    Zhang, Hong
    Yuan, Ding
    Sun, Mingui
    [J]. NEUROCOMPUTING, 2019, 329 : 21 - 31
  • [9] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [10] Person Re-Identification by Symmetry-Driven Accumulation of Local Features
    Farenzena, M.
    Bazzani, L.
    Perina, A.
    Murino, V.
    Cristani, M.
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2360 - 2367