Enhanced Context Learning with Transformer for Human Parsing

被引：1

作者：

Song, Jingya ^{[1
,2
,3
]}

Shi, Qingxuan ^{[1
,2
,3
]}

Li, Yihang ^{[1
,2
,3
]}

Yang, Fang ^{[1
,2
,3
]}

机构：

[1] Hebei Univ, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China

[2] Hebei Univ, Hebei Machine Vis Engn Res Ctr, Baoding 071002, Peoples R China

[3] Hebei Univ, Inst Intelligent Image & Document Informat Proc, Baoding 071002, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期

关键词：

human parsing; semantic segmentation; deep learning; SEGMENTATION;

D O I：

10.3390/app12157821

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset.

引用

页数：16

共 50 条

[31] Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
Chen, Xiaoxue
Liu, Tianyu
Zhao, Hao
Zhou, Guyue
Zhang, Ya-Qin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19617 - 19626
[32] Human Parsing with Discriminant Feature Learning for Person Re-identification
Zhao, Congcong
Chen, Bin
Chen, Bo
PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2020, 2020, : 30 - 34
[33] LSLPCT: An Enhanced Local Semantic Learning Transformer for 3-D Point Cloud Analysis
Song, Yupeng
He, Fazhi
Duan, Yansong
Si, Tongzhen
Bai, Junwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[34] Modality adaptation via feature difference learning for depth human parsing
Huang, Shaofei
Hui, Tianrui
Gong, Yue
Peng, Fengguang
Fang, Yuqiang
Wang, Jingwei
Ma, Bin
Wei, Xiaoming
Han, Jizhong
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
[35] TAO: A TRILATERAL AWARENESS OPERATION FOR HUMAN PARSING
Huang, Enbo
Su, Zhuo
Zhou, Fan
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[36] A Universal Decoupled Training Framework for Human Parsing
Li, Yang
Zuo, Huahong
Han, Ping
SENSORS, 2022, 22 (16)
[37] Clicking Matters: Towards Interactive Human Parsing
Gao, Yutong
Liang, Liqian
Lang, Congyan
Feng, Songhe
Li, Yidong
Wei, Yunchao
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3190 - 3203
[38] IMPROVING HUMAN PARSING BY EXTRACTING GLOBAL INFORMATION USING THE NON-LOCAL OPERATION
Li, Tianpeng
Wan, Weitao
Huang, Yiqing
Chen, Jiansheng
Hu, Chunhua
Ma, Yu
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2961 - 2965
[39] Contrastive Learning With Context-Augmented Transformer for Change Detection in SAR Images
Dong, Huihui
Ma, Wenping
Jiao, Licheng
Liu, Fang
Liu, Xu
Zhu, Hao
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 17710 - 17724
[40] A Novel Human Parsing Method Driven by Multi-Scale Feature Blend Network
Wang, Chunxu
Xu, Benzhu
Zhang, Gaofeng
ICRSA 2021: 2021 4TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, 2021, : 30 - 38

← 1 2 3 4 5 →