SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

被引:0
|
作者
An, Xiaoqi [1 ,2 ]
Zhao, Lin [1 ,2 ]
Gong, Chen [1 ]
Wang, Nannan [2 ]
Wang, Di [2 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, PCA Lab,Key Lab Intelligent Percept & Syst High D, Nanjing, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: "Only sparse human keypoint locations are detected for human pose estimation, is it really necessary to describe the whole image in a dense, high-resolution manner?" Based on dynamic transformer models, we propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose). In detail, SHaRPose consists of two stages. At the coarse stage, the relations between image regions and keypoints are dynamically mined while a coarse estimation is generated. Then, a quality predictor is applied to decide whether the coarse estimation results should be refined. At the fine stage, SHaRPose builds sparse high-resolution representations only on the regions related to the keypoints and provides refined high-precision human pose estimations. Extensive experiments demonstrate the outstanding performance of the proposed method. Specifically, compared to the state-of-the-art method ViTPose, our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of 1.4x faster than ViTPose-Base. Code is available at https://github.com/AnxQ/sharpose.
引用
收藏
页码:691 / 699
页数:9
相关论文
共 50 条
  • [41] HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
    Xu, Zhoujie
    Dai, Meng
    Zhang, Qing
    Jiang, Xiaodi
    NEUROCOMPUTING, 2025, 619
  • [42] Feature Representation for High-resolution Clothed Human Reconstruction
    Pu, Juncheng
    Liu, Li
    Fu, Xiaodong
    Su, Zhuo
    Liu, Lijun
    Peng, Wei
    COMPUTER GRAPHICS FORUM, 2023, 42 (06)
  • [43] A Bayesian Framework for Sparse Representation-Based 3-D Human Pose Estimation
    Babagholami-Mohamadabadi, Behnam
    Jourabloo, Amin
    Zarghami, Ali
    Kasaei, Shohreh
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 297 - 300
  • [44] An Overview of Marine Moving Target Detection via High-resolution Sparse Representation
    Yu, Xiaohan
    Chen, Xiaolong
    Hu, Wenchao
    Guan, Jian
    2016 CIE INTERNATIONAL CONFERENCE ON RADAR (RADAR), 2016,
  • [45] Vehicle Detection in High-Resolution Aerial Images via Sparse Representation and Superpixels
    Chen, Ziyi
    Wang, Cheng
    Wen, Chenglu
    Teng, Xiuhua
    Chen, Yiping
    Guan, Haiyan
    Luo, Huan
    Cao, Liujuan
    Li, Jonathan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (01): : 103 - 116
  • [46] NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks
    Kim, Doyub
    Lee, Minjae
    Museth, Ken
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (02):
  • [47] DepthFormer: A High-Resolution Depth-Wise Transformer for Animal Pose Estimation
    Liu, Sicong
    Fan, Qingcheng
    Liu, Shanghao
    Zhao, Chunjiang
    AGRICULTURE-BASEL, 2022, 12 (08):
  • [48] Leveraging active perception for real-time high-resolution pose estimation
    Manousis, Theodoros
    Eleftheriadis, Emmanouil
    Passalis, Nikolaos
    Tefas, Anastasios
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
  • [49] ENABLING HIGH-RESOLUTION POSE ESTIMATION IN REAL TIME USING ACTIVE PERCEPTION
    Manousis, Theodoros
    Passalis, Nikolaos
    Tefas, Anastasios
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2425 - 2429
  • [50] CWF-HRNet: Context-aware Fusion for Human Pose Estimation based on High-Resolution Networks
    Ding, Shiyu
    Li, Jin
    Luan, Kuan
    Liang, Hong
    Xing, Jiqing
    2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 44 - 49