Segmentation Guided Attention Networks for Human Pose Estimation

被引：0

作者：

Tang, Jingfan ^{[1
]}

Lu, Jipeng ^{[1
]}

Zhang, Xuefeng ^{[2
,3
]}

Zhao, Fang ^{[4
]}

机构：

[1] Hangzhou Dianzi Univ, Coll Comp, Hangzhou 310018, Peoples R China

[2] Ningbo Univ, Coll Sci & Technol, Lab Intelligent Home Appliances, Ningbo 315300, Peoples R China

[3] Ningbo Univ, Coll Sci & Technol, Sch Informat Engn, Ningbo 315300, Peoples R China

[4] Zhejiang Shuren Univ, Coll Informat Sci & Technol, Hangzhou 310015, Peoples R China

来源：

TRAITEMENT DU SIGNAL | 2024年 / 41卷 / 05期

关键词：

human pose estimation; segmentation guided attention; spatial attention maps; deep learning; accuracy improvement;

D O I：

10.18280/ts.410522

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human pose estimation is an important and widely studied task in computer vision. One of the difficulties inAhuman pose estimation is that the model is vulnerable to complex backgrounds when making predictions. In this paper, we propose a deep high-resolution network based on segmentation guided. A conceptually simple but computationally efficient segmentation guided module is used to generate segmentation maps. The obtained segmentation map will be used as a spatial attention map in the feature extraction stage. Since the skeletal point region is used as the foreground in the segmentation map, the model pays more attention to the key point region to effectively reduce the influence of complex background on the prediction results. The segmentation guided module provides a spatial attention map with a priori knowledge, unlike the traditional spatial attention mechanism. To verify the effectiveness of our method, we conducted a series of comparison experiments on the MPII human pose dataset and the COCO2017 keypoint detection dataset. The highest boosting effect of our model compared to HRNet on the COCO2017 dataset is up to 3%. The experimental results show that this segmentation guidance mechanism is effective in improving accuracy.

引用

页码：2485 / 2493

页数：9

共 24 条

[1] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[2] Cascaded Pyramid Network for Multi-Person Pose Estimation [J].

Chen, Yilun ;

Wang, Zhicheng ;

Peng, Yuxiang ;

Zhang, Zhiqiang ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112

[3] Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation [J].

Chen, Yu ;

Shen, Chunhua ;

Wei, Xiu-Shen ;

Liu, Lingqiao ;

Yang, Jian .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1221-1230

[4] Multi-Context Attention for Human Pose Estimation [J].

Chu, Xiao ;

Yang, Wei ;

Ouyang, Wanli ;

Ma, Cheng ;

Yuille, Alan L. ;

Wang, Xiaogang .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678

[5] Relation-Based Associative Joint Location for Human Pose Estimation in Videos [J].

Dang, Yonghao ;

Yin, Jianqin ;

Zhang, Shaojie .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :3973-3986

[6] Deep High-Resolution Network With Double Attention Residual Blocks for Human Pose Estimation [J].

Huo, Zhanqiang ;

Jin, Han ;

Qiao, Yingxu ;

Luo, Fen .

IEEE ACCESS, 2020, 8 :224947-224957

[7] YOLOv3-based human detection and heuristically modified-LSTM for abnormal human activities detection in ATM machine [J].

Kshirsagar, Aniruddha Prakash ;

Azath, H. .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95

[8] A survey on intelligent human action recognition techniques [J].

Kumar, Rahul ;

Kumar, Shailender .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) :52653-52709

[9] VideoLSTM convolves, attends and flows for action recognition [J].

Li, Zhenyang ;

Gavrilyuk, Kirill ;

Gavves, Efstratios ;

Jain, Mihir ;

Snoek, Cees G. M. .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 166 :41-50

[10] End-to-End Multi-Task Learning with Attention [J].

Liu, Shikun ;

Johns, Edward ;

Davison, Andrew J. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1871-1880

← 1 2 3 →