Relation-Based Associative Joint Location for Human Pose Estimation in Videos

被引:18
作者
Dang, Yonghao [1 ]
Yin, Jianqin [1 ]
Zhang, Shaojie [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Pose estimation; Feature extraction; Correlation; Videos; Heating systems; Convolution; Human pose estimation; keypoint detection; relation modeling; temporal consistency; FLEXIBLE MIXTURES;
D O I
10.1109/TIP.2022.3177959
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based human pose estimation (VHPE) is a vital yet challenging task. While deep learning algorithms have made tremendous progress for the VHPE, lots of these approaches to this task implicitly model the long-range interaction between joints by expanding the receptive field of the convolution or designing a graph manually. Unlike prior methods, we design a lightweight and plug-and-play joint relation extractor (JRE) to explicitly and automatically model the associative relationship between joints. The JRE takes the pseudo heatmaps of joints as input and calculates their similarity. In this way, the JRE can flexibly learn the correlation between any two joints, allowing it to learn the rich spatial configuration of human poses. Furthermore, the JRE can infer invisible joints according to the correlation between joints, which is beneficial for locating occluded joints. Then, combined with temporal semantic continuity modeling, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) for video-based human pose estimation. Specifically, to capture the temporal dynamics of poses, the pose semantic information of the current frame is transferred to the next with a joint relation guided pose semantics propagator (JRPSP). The JRPSP can transfer the pose semantic features from the non-occluded frame to the occluded frame. The proposed RPSTN achieves state-of-the-art or competitive results on the video-based Penn Action, Sub-JHMDB, PoseTrack2018, and HiEve datasets. Moreover, the proposed JRE improves the performance of backbones on the image-based COCO2017 dataset. Code is available at https://github.com/YHDang/pose-estimation.
引用
收藏
页码:3973 / 3986
页数:14
相关论文
共 61 条
[1]  
Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]
[2]   PoseTrack: A Benchmark for Human Pose Estimation and Tracking [J].
Andriluka, Mykhaylo ;
Iqbal, Umar ;
Insafutdinov, Eldar ;
Pishchulin, Leonid ;
Milan, Anton ;
Gall, Juergen ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5167-5176
[3]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[4]  
Andriluka M, 2009, PROC CVPR IEEE, P1014, DOI 10.1109/CVPRW.2009.5206754
[5]   UniPose: Unified Human Pose Estimation in Single Images and Videos [J].
Artacho, Bruno ;
Savakis, Andreas .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7033-7042
[6]   Recurrent Human Pose Estimation [J].
Belagiannis, Vasileios ;
Zisserman, Andrew .
2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, :468-475
[7]   Structure-aware human pose estimation with graph convolutional networks [J].
Bin, Yanrui ;
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Chen, Xinya ;
Gao, Changxin ;
Sang, Nong .
PATTERN RECOGNITION, 2020, 106
[8]   Towards Accurate Human Pose Estimation in Videos of Crowded Scenes [J].
Chang, Shuning ;
Yuan, Li ;
Nie, Xuecheng ;
Huang, Ziyuan ;
Zhou, Yichen ;
Chen, Yupeng ;
Feng, Jiashi ;
Yan, Shuicheng .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :4630-4634
[9]   Cascaded Pyramid Network for Multi-Person Pose Estimation [J].
Chen, Yilun ;
Wang, Zhicheng ;
Peng, Yuxiang ;
Zhang, Zhiqiang ;
Yu, Gang ;
Sun, Jian .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112
[10]   HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [J].
Cheng, Bowen ;
Xiao, Bin ;
Wang, Jingdong ;
Shi, Honghui ;
Huang, Thomas S. ;
Zhang, Lei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5385-5394