TokenPose: Learning Keypoint Tokens for Human Pose Estimation

被引：147

作者：

Li, Yanjie ^{[1
,2
]}

Zhang, Shoukui ^{[2
]}

Wang, Zhicheng ^{[2
]}

Yang, Sen ^{[2
,3
]}

Yang, Wankou ^{[3
]}

Xia, Shu-Tao ^{[1
,4
]}

Zhou, Erjin ^{[2
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] MEGVII Technol, Beijing, Peoples R China

[3] Southeast Univ, Beijing, Peoples R China

[4] Peng Cheng Lab, PCL Res Ctr Networks & Commun, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.01112

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation (TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve 72.5 AP and 75.8 AP on COCO validation dataset respectively, with significant reduction in parameters (down arrow 80.6%; down arrow 56.8%) and GFLOPs (down arrow 75.3%; down arrow 24.7%). Code is publicly available(1).

引用

页码：11293 / 11302

页数：10

共 44 条

[1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
Andriluka, Mykhaylo
Pishchulin, Leonid
Gehler, Peter
Schiele, Bernt
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
[2] [Anonymous], 2021, P IEEE CVF C COMP VI, DOI DOI 10.1109/TSMC.2019.2958072
[3] Recurrent Human Pose Estimation
Belagiannis, Vasileios
Zisserman, Andrew
[J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 468 - 475
[4] Brown Tom, 2020, ADV NEURAL INFORM PR
[5] Cai Yuanhao, 2020, ECCV
[6] Carion N, 2020, EUR C COMP VIS, P213
[7] Chen H, 2020, J COASTAL RES, P1, DOI [10.1001/jamainternmed.2020.2020, 10.2112/JCR-SI112-001.1]
[8] Cascaded Pyramid Network for Multi-Person Pose Estimation
Chen, Yilun
Wang, Zhicheng
Peng, Yuxiang
Zhang, Zhiqiang
Yu, Gang
Sun, Jian
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7103 - 7112
[9] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Cheng, Bowen
Xiao, Bin
Wang, Jingdong
Shi, Honghui
Huang, Thomas S.
Zhang, Lei
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5385 - 5394
[10] Dai Zhigang, 2020, ARXIV201109094

← 1 2 3 4 5 →