Multi-view 3D Smooth Human Pose Estimation based on Heatmap Filtering and Spatio-temporal Information

被引:3
作者
Niu, Zehai [1 ]
Lu, Ke [1 ,2 ]
Xue, Jian [1 ]
Ma, Haifeng [1 ]
Wei, Runchen [1 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金;
关键词
3D Human Pose Estimation; Multiple View Geometry; Temporal;
D O I
10.1145/3474085.3475185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The estimation of 3D human poses from time-synchronized, calibrated multi-view video usually consists of two steps: (1) a 2D detector to locate the 2D coordinate point position of the joint via heatmaps for each frame and (2) a post-processing method such as the recursive pictorial structure model or robust triangulation to obtain 3D coordinate points. However, most existing methods are based on a single frame only. They do not take advantage of the temporal characteristics of the video sequence itself, and must rely on post-processing algorithms. They are also susceptible to human self-occlusion, and the generated sequences suffer from jitter. Therefore, we propose a network model incorporating spatial and temporal features. Using a coarse-to-fine approach, the proposed heatmap temporal network (HTN) generates temporal heatmap information, with an occlusion heatmap filter used to filter low-quality heatmaps before they are sent to the HTN. The heatmap fusion and the triangulation weights are dynamically adjusted, and intermediate supervision is employed to enable better integration of temporal and spatial information. Our network is also end-toend differentiable. This overcomes the long-standing problem of skeleton jitter being generated and ensures that the sequence is smooth and stable.
引用
收藏
页码:442 / 450
页数:9
相关论文
共 29 条
[21]   3D human pose estimation in video with temporal convolutions and semi-supervised training [J].
Pavllo, Dario ;
Feichtenhofer, Christoph ;
Grangier, David ;
Auli, Michael .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7745-7754
[22]   Cross View Fusion for 3D Human Pose Estimation [J].
Qiu, Haibo ;
Wang, Chunyu ;
Wang, Jingdong ;
Wang, Naiyan ;
Zeng, Wenjun .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4341-4350
[23]   Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [J].
Remelli, Edoardo ;
Han, Shangchen ;
Honari, Sina ;
Fua, Pascal ;
Wang, Robert .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6039-6048
[24]   Deep High-Resolution Representation Learning for Human Pose Estimation [J].
Sun, Ke ;
Xiao, Bin ;
Liu, Dong ;
Wang, Jingdong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5686-5696
[25]   Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture [J].
Tome, Denis ;
Toso, Matteo ;
Agapito, Lourdes ;
Russell, Chris .
2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, :474-483
[26]   DeepPose: Human Pose Estimation via Deep Neural Networks [J].
Toshev, Alexander ;
Szegedy, Christian .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1653-1660
[27]   Motion Guided 3D Pose Estimation from Videos [J].
Wang, Jingbo ;
Yan, Sijie ;
Xiong, Yuanjun ;
Lin, Dahua .
COMPUTER VISION - ECCV 2020, PT XIII, 2020, 12358 :764-780
[28]   Convolutional Pose Machines [J].
Wei, Shih-En ;
Ramakrishna, Varun ;
Kanade, Takeo ;
Sheikh, Yaser .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4724-4732
[29]   Simple Baselines for Human Pose Estimation and Tracking [J].
Xiao, Bin ;
Wu, Haiping ;
Wei, Yichen .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :472-487