TokenPose: Learning Keypoint Tokens for Human Pose Estimation

被引:147
作者
Li, Yanjie [1 ,2 ]
Zhang, Shoukui [2 ]
Wang, Zhicheng [2 ]
Yang, Sen [2 ,3 ]
Yang, Wankou [3 ]
Xia, Shu-Tao [1 ,4 ]
Zhou, Erjin [2 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] MEGVII Technol, Beijing, Peoples R China
[3] Southeast Univ, Beijing, Peoples R China
[4] Peng Cheng Lab, PCL Res Ctr Networks & Commun, Shenzhen, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation (TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve 72.5 AP and 75.8 AP on COCO validation dataset respectively, with significant reduction in parameters (down arrow 80.6%; down arrow 56.8%) and GFLOPs (down arrow 75.3%; down arrow 24.7%). Code is publicly available(1).
引用
收藏
页码:11293 / 11302
页数:10
相关论文
共 44 条
  • [1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
    Andriluka, Mykhaylo
    Pishchulin, Leonid
    Gehler, Peter
    Schiele, Bernt
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
  • [2] [Anonymous], 2021, P IEEE CVF C COMP VI, DOI DOI 10.1109/TSMC.2019.2958072
  • [3] Recurrent Human Pose Estimation
    Belagiannis, Vasileios
    Zisserman, Andrew
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 468 - 475
  • [4] Brown Tom, 2020, ADV NEURAL INFORM PR
  • [5] Cai Yuanhao, 2020, ECCV
  • [6] Carion N, 2020, EUR C COMP VIS, P213
  • [7] Chen H, 2020, J COASTAL RES, P1, DOI [10.1001/jamainternmed.2020.2020, 10.2112/JCR-SI112-001.1]
  • [8] Cascaded Pyramid Network for Multi-Person Pose Estimation
    Chen, Yilun
    Wang, Zhicheng
    Peng, Yuxiang
    Zhang, Zhiqiang
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7103 - 7112
  • [9] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
    Cheng, Bowen
    Xiao, Bin
    Wang, Jingdong
    Shi, Honghui
    Huang, Thomas S.
    Zhang, Lei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5385 - 5394
  • [10] Dai Zhigang, 2020, ARXIV201109094