Enhanced Context Learning with Transformer for Human Parsing

被引:1
作者
Song, Jingya [1 ,2 ,3 ]
Shi, Qingxuan [1 ,2 ,3 ]
Li, Yihang [1 ,2 ,3 ]
Yang, Fang [1 ,2 ,3 ]
机构
[1] Hebei Univ, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China
[2] Hebei Univ, Hebei Machine Vis Engn Res Ctr, Baoding 071002, Peoples R China
[3] Hebei Univ, Inst Intelligent Image & Document Informat Proc, Baoding 071002, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
关键词
human parsing; semantic segmentation; deep learning; SEGMENTATION;
D O I
10.3390/app12157821
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Fully Convolutional Neural Network with Relation Aware Context Information for Image Parsing
    Azam, Basim
    Mandal, Ranju
    Verma, Brijesh
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 127 - 132
  • [42] Semantic-spatial fusion network for human parsing
    Zhang, Xiaomei
    Chen, Yingying
    Zhu, Bingke
    Wang, Jinqiao
    Tang, Ming
    NEUROCOMPUTING, 2020, 402 : 375 - 383
  • [43] Grammar-Induced Wavelet Network for Human Parsing
    Zhang, Xiaomei
    Chen, Yingying
    Tang, Ming
    Lei, Zhen
    Wang, Jinqiao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4502 - 4514
  • [44] Human Parsing via Shape Boltzmann Machine Networks
    Wang, Qiurui
    Yuan, Chun
    Huang, Feiyue
    Wang, Chengjie
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT I, 2015, 9314 : 653 - 663
  • [45] Kinematic skeleton graph augmented network for human parsing
    Liu, Jinde
    Zhang, Zhang
    Shan, Caifeng
    Tan, Tieniu
    NEUROCOMPUTING, 2020, 413 : 457 - 470
  • [46] FCGNet: Foreground and Class Guided Network for human parsing
    Jang, Jaehyuk
    Wang, Yooseung
    Kim, Changick
    PATTERN RECOGNITION, 2025, 157
  • [47] Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions
    Zhang, Ruimao
    Lin, Liang
    Wang, Guangrun
    Wang, Meng
    Zuo, Wangmeng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 596 - 610
  • [48] Learning deep representations for semantic image parsing: a comprehensive overview
    Huang, Lili
    Peng, Jiefeng
    Zhang, Ruimao
    Li, Guanbin
    Lin, Liang
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (05) : 840 - 857
  • [49] Video scene parsing: An overview of deep learning methods and datasets
    Yan, Xiyu
    Gong, Huihui
    Jiang, Yong
    Xia, Shu-Tao
    Zheng, Feng
    You, Xinge
    Shao, Ling
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 201
  • [50] From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing
    Liu, Yunan
    Wang, Chunpeng
    Lu, Mingyu
    Yang, Jian
    Gui, Jie
    Zhang, Shanshan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5449 - 5462