Towards High Performance Human Keypoint Detection

被引:57
作者
Zhang, Jing [1 ]
Chen, Zhe [1 ]
Tao, Dacheng [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Comp Sci, Darlington, NSW 2008, Australia
基金
澳大利亚研究理事会;
关键词
Human Pose Estimation; Deep Nerual Networks; Sub-pixel Refinement; Context;
D O I
10.1007/s11263-021-01482-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination, and scale variance. In this paper, we address this problem from three aspects by devising an efficient network structure, proposing three effective training strategies, and exploiting four useful postprocessing techniques. First, we find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer (CCM), which efficiently integrates spatial and channel context information and progressively refines them. Then, to maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy by exploiting abundant unlabeled data. It enables CCM to learn discriminative features from massive diverse poses. Third, we present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy. Extensive experiments on the MS COCO keypoint detection benchmark demonstrate the superiority of the proposed method over representative state-of-the-art (SOTA) methods. Our single model achieves comparable performance with the winner of the 2018 COCO Keypoint Detection Challenge. The final ensemble model sets a new SOTA on this benchmark.
引用
收藏
页码:2639 / 2662
页数:24
相关论文
共 57 条
[31]   Stacked Hourglass Networks for Human Pose Estimation [J].
Newell, Alejandro ;
Yang, Kaiyu ;
Deng, Jia .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :483-499
[32]   Learning Semantic-Aligned Action Representation [J].
Ni, Bingbing ;
Li, Teng ;
Yang, Xiaokang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) :3715-3725
[33]   Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model [J].
Ouyang, Wanli ;
Zeng, Xingyu ;
Wang, Xiaogang .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (01) :14-27
[34]   PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model [J].
Papandreou, George ;
Zhu, Tyler ;
Chen, Liang-Chieh ;
Gidaris, Spyros ;
Tompson, Jonathan ;
Murphy, Kevin .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :282-299
[35]   Towards Accurate Multi-person Pose Estimation in the Wild [J].
Papandreou, George ;
Zhu, Tyler ;
Kanazawa, Nori ;
Toshev, Alexander ;
Tompson, Jonathan ;
Bregler, Chris ;
Murphy, Kevin .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3711-3719
[36]  
Paszke A., 2017, NIPS 2017 WORKSHOP A
[37]   Ordinal Depth Supervision for 3D Human Pose Estimation [J].
Pavlakos, Georgios ;
Zhou, Xiaowei ;
Daniilidis, Kostas .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7307-7316
[38]   Learning to Estimate 3D Human Pose and Shape from a Single Color Image [J].
Pavlakos, Georgios ;
Zhu, Luyang ;
Zhou, Xiaowei ;
Daniilidis, Kostas .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :459-468
[39]   DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation [J].
Pishchulin, Leonid ;
Insafutdinov, Eldar ;
Tang, Siyu ;
Andres, Bjoern ;
Andriluka, Mykhaylo ;
Gehler, Peter ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4929-4937
[40]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149