Multi-view 3D Smooth Human Pose Estimation based on Heatmap Filtering and Spatio-temporal Information

被引：3

作者：

Niu, Zehai ^{[1
]}

Lu, Ke ^{[1
,2
]}

Xue, Jian ^{[1
]}

Ma, Haifeng ^{[1
]}

Wei, Runchen ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

3D Human Pose Estimation; Multiple View Geometry; Temporal;

D O I：

10.1145/3474085.3475185

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The estimation of 3D human poses from time-synchronized, calibrated multi-view video usually consists of two steps: (1) a 2D detector to locate the 2D coordinate point position of the joint via heatmaps for each frame and (2) a post-processing method such as the recursive pictorial structure model or robust triangulation to obtain 3D coordinate points. However, most existing methods are based on a single frame only. They do not take advantage of the temporal characteristics of the video sequence itself, and must rely on post-processing algorithms. They are also susceptible to human self-occlusion, and the generated sequences suffer from jitter. Therefore, we propose a network model incorporating spatial and temporal features. Using a coarse-to-fine approach, the proposed heatmap temporal network (HTN) generates temporal heatmap information, with an occlusion heatmap filter used to filter low-quality heatmaps before they are sent to the HTN. The heatmap fusion and the triangulation weights are dynamically adjusted, and intermediate supervision is employed to enable better integration of temporal and spatial information. Our network is also end-toend differentiable. This overcomes the long-standing problem of skeleton jitter being generated and ensures that the sequence is smooth and stable.

引用

页码：442 / 450

页数：9

共 29 条

[1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].

Andriluka, Mykhaylo ;

Pishchulin, Leonid ;

Gehler, Peter ;

Schiele, Bernt .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693

[2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].

Cai, Yujun ;

Ge, Liuhao ;

Liu, Jun ;

Cai, Jianfei ;

Cham, Tat-Jen ;

Yuan, Junsong ;

Thalmann, Nadia Magnenat .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281

[3] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].

Cao, Zhe ;

Hidalgo, Gines ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186

[4]

Chen T., 2020, arXiv

[5] Cascaded Pyramid Network for Multi-Person Pose Estimation [J].

Chen, Yilun ;

Wang, Zhicheng ;

Peng, Yuxiang ;

Zhang, Zhiqiang ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112

[6] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [J].

Cheng, Bowen ;

Xiao, Bin ;

Wang, Jingdong ;

Shi, Honghui ;

Huang, Thomas S. ;

Zhang, Lei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5385-5394

[7] Learning 3D Human Pose from Structure and Motion [J].

Dabral, Rishabh ;

Mundhada, Anurag ;

Kusupati, Uday ;

Afaque, Safeer ;

Sharma, Abhishek ;

Jain, Arjun .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :679-696

[8]

Hartley R., 2003, Multiple view geometry in computer vision

[9]

He Y., 2020, 2020 IEEECVF C COMPU, P7779

[10] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments [J].

Ionescu, Catalin ;

Papava, Dragos ;

Olaru, Vlad ;

Sminchisescu, Cristian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) :1325-1339

← 1 2 3 →