Deep Dual Consecutive Network for Human Pose Estimation

被引：104

作者：

Liu, Zhenguang ^{[1
]}

Chen, Haoming ^{[1
]}

Feng, Runyang ^{[1
]}

Wu, Shuang ^{[2
]}

Ji, Shouling ^{[3
]}

Yang, Bailin ^{[1
]}

Wang, Xun ^{[1
]}

机构：

[1] Zhejiang Gongshang Univ, Hangzhou, Peoples R China

[2] Nanyang Technol Univ, Singapore, Singapore

[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR46437.2021.00059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-frame human pose estimation in complicated situations is challenging. Although state-of-the-art human joints detectors have demonstrated remarkable results for static images, their performances come short when we apply these models to video sequences. Prevalent shortcomings include the failure to handle motion blur, video defocus, or pose occlusions, arising from the inability in capturing the temporal dependency among video frames. On the other hand, directly employing conventional recurrent neural networks incurs empirical difficulties in modeling spatial contexts, especially for dealing with pose occlusions. In this paper, we propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection. Three modular components are designed in our framework. A Pose Temporal Merger encodes keypoint spatiotemporal context to generate effective searching scopes while a Pose Residual Fusion module computes weighted pose residuals in dual directions. These are then processed via our Pose Correction Network for efficient refining of pose estimations. Our method ranks No.1 in the Multi-frame Person Pose Estimation Challenge on the large-scale benchmark datasets PoseTrack2017 and PoseTrack2018. We have released our code, hoping to inspire future research.

引用

页码：525 / 534

页数：10