Fixed-resolution representation network for human pose estimation

被引:0
作者
Yongxiang Liu
Xiaorong Hou
机构
[1] University of Electronic Science and Technology of China,School of Automation Engineering
来源
Multimedia Systems | 2022年 / 28卷
关键词
Human pose estimation; Fixed-resolution representation; Multi-receptive fields; Feature extraction; Information selection;
D O I
暂无
中图分类号
学科分类号
摘要
Human pose estimation from a single image is a fundamental yet challenging task in computer vision. Most existing methods gradually generated multi-resolution from high-resolution to low-resolution, then recovered the higher resolution from the low resolution and used it to generate final pose heatmaps, such as Hourglass and HRNet and their variants. In this paper, we propose a novel architecture named fixed-resolution representation network for human pose estimation, which maintains fixed-resolution through the whole process to keep rich spatial-structural information. An Improved Pyramid Convolutional Bottleneck (IPCB) is firstly proposed to encode feature maps with multi receptive fields with the same resolution. Secondly, we introduce an efficient channel attention mechanism to enhance the feature extraction and information selection capability of IPCB, making the performance of IPCB better. Thirdly, considering the deviation from using the flip test of reasoning, we use an existing technology: Unbiased Data Processing. Fourthly, due to the change of the model structure and the limited computing resources, we introduce an iterative retraining strategy to solve the problem of pre-training. We empirically demonstrate the effectiveness of our method and achieve a competitive performance with 1.7M parameters and 3G FLOPs, 89.5 (PCKh@0.5) and 92.7 (PCK@0.2) respectively, compared with the state-of-the-art methods on the benchmark dataset: the MPII and LSP key points detection dataset.
引用
收藏
页码:1597 / 1609
页数:12
相关论文
共 53 条
  • [1] Zheng L(2019)Pose invariant embedding for deep person re-identification Proc. IEEE Trans. Image Process. 28 4500-4509
  • [2] Huang Y(2012)Microsoft kinect sensor and its effect IEEE MultiMedia 19 4-10
  • [3] Lu H(2019)Convolutional relation network for skeleton-based action recognition Neurocomputing 370 109-117
  • [4] Yang Y(2014)Joint training of a convolutional network and a graphical model for human pose estimation NIPS 27 1799-1807
  • [5] Zhang Z(2014)Human pose estimation via deep neural networks CVPR 27 1653-1660
  • [6] Zhu J(2016)Jia Deng Stacked hourglass networks for human pose estimation ECCV 9912 483-499
  • [7] Zou W(2016)Convolutional pose machines CVPR 9912 4724-4732
  • [8] Zhu Z(2020)Monocular human pose estimation: a survey of deep learning-based methods Comput. Vis. Image Understand. 192 102897-1807
  • [9] Hu Y(2017)Learning feature pyramids for human pose estimation ICCV 27 1799-732
  • [10] Tompson J(2016)Human pose estimation via convolutional part heatmap regression ECCV 9911 717-472