Enhanced Context Learning with Transformer for Human Parsing

被引：1

作者：

Song, Jingya ^{[1
,2
,3
]}

Shi, Qingxuan ^{[1
,2
,3
]}

Li, Yihang ^{[1
,2
,3
]}

Yang, Fang ^{[1
,2
,3
]}

机构：

[1] Hebei Univ, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China

[2] Hebei Univ, Hebei Machine Vis Engn Res Ctr, Baoding 071002, Peoples R China

[3] Hebei Univ, Inst Intelligent Image & Document Informat Proc, Baoding 071002, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期

关键词：

human parsing; semantic segmentation; deep learning; SEGMENTATION;

D O I：

10.3390/app12157821

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset.

引用

页数：16

共 50 条

[41] Fully Convolutional Neural Network with Relation Aware Context Information for Image Parsing
Azam, Basim
Mandal, Ranju
Verma, Brijesh
2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 127 - 132
[42] Semantic-spatial fusion network for human parsing
Zhang, Xiaomei
Chen, Yingying
Zhu, Bingke
Wang, Jinqiao
Tang, Ming
NEUROCOMPUTING, 2020, 402 : 375 - 383
[43] Grammar-Induced Wavelet Network for Human Parsing
Zhang, Xiaomei
Chen, Yingying
Tang, Ming
Lei, Zhen
Wang, Jinqiao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4502 - 4514
[44] Human Parsing via Shape Boltzmann Machine Networks
Wang, Qiurui
Yuan, Chun
Huang, Feiyue
Wang, Chengjie
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT I, 2015, 9314 : 653 - 663
[45] Kinematic skeleton graph augmented network for human parsing
Liu, Jinde
Zhang, Zhang
Shan, Caifeng
Tan, Tieniu
NEUROCOMPUTING, 2020, 413 : 457 - 470
[46] FCGNet: Foreground and Class Guided Network for human parsing
Jang, Jaehyuk
Wang, Yooseung
Kim, Changick
PATTERN RECOGNITION, 2025, 157
[47] Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions
Zhang, Ruimao
Lin, Liang
Wang, Guangrun
Wang, Meng
Zuo, Wangmeng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 596 - 610
[48] Learning deep representations for semantic image parsing: a comprehensive overview
Huang, Lili
Peng, Jiefeng
Zhang, Ruimao
Li, Guanbin
Lin, Liang
FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (05) : 840 - 857
[49] Video scene parsing: An overview of deep learning methods and datasets
Yan, Xiyu
Gong, Huihui
Jiang, Yong
Xia, Shu-Tao
Zheng, Feng
You, Xinge
Shao, Ling
COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 201
[50] From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing
Liu, Yunan
Wang, Chunpeng
Lu, Mingyu
Yang, Jian
Gui, Jie
Zhang, Shanshan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5449 - 5462

← 1 2 3 4 5 →