BundleMoCap plus plus : Efficient, robust and smooth motion capture from sparse multiview videos

被引：0

作者：

Albanis, Georgios ^{[1
,2
]}

Zioulis, Nikolaos ^{[2
]}

Kolomvatsos, Kostas ^{[1
]}

机构：

[1] Univ Thessaly, Dept Informat & Telecommun, Lamia, Greece

[2] Moverse, Thessaloniki, Greece

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 249卷

关键词：

Markerless motion capture; Human pose estimation; Human pose prior; RIEMANNIAN-MANIFOLDS; OPTIMIZATION;

D O I：

10.1016/j.cviu.2024.104190

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Producing smooth and accurate motions from sparse videos without requiring specialized equipment and markers is a long-standing problem in the research community. Most approaches typically involve complex processes such as temporal constraints, multiple stages combining data-driven regression and optimization techniques, and bundle solving over temporal windows. These increase the computational burden and introduce the challenge of hyperparameter tuning for the different objective terms. In contrast, BundleMoCap++ offers a simple yet effective approach to this problem. It solves the motion in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions without compromising accuracy. BundleMoCap++ outperforms the state-of-the-art without increasing complexity. Our approach is based on manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption and appropriate interpolation schemes, we efficiently solve a bundle of frames using two or more latent codes. Additionally, the method is implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap++'s strength lies in achieving high-quality motion capture results with fewer computational resources. To do this efficiently, we propose a novel human pose prior that focuses on the geometric aspect of the latent space, modeling it as a hypersphere, allowing for the introduction of sophisticated interpolation techniques. We also propose an algorithm for optimizing the latent variables directly on the learned manifold, improving convergence and performance. Finally, we introduce high-order interpolation techniques adapted for the hypersphere, allowing us to increase the solving temporal window, enhancing performance and efficiency.

引用

页数：15

共 80 条

[1] Akhter I., 2012, P IEEE C COMP VIS PA, P4800
[2] Noise-in, Bias-out: Balanced and Real-time MoCap Solving
Albanis, Georgios
Zioulis, Nikolaos
Thermos, Spyridon
Chatzitofis, Anargyros
Kolomvatsos, Kostas
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4239 - 4249
[3] BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos
Albanis, Georgios
Zioulis, Nikolaos
Kolomvatsos, Kostas
[J]. 20TH ACM SIGGRAPH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION, CVMP 2023, 2023,
[4] Exploiting temporal context for 3D human pose estimation in the wild
Arnab, Anurag
Doersch, Carl
Zisserman, Andrew
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3390 - 3399
[5] An implicit trust-region method on Riemannian manifolds
Baker, C. G.
Absil, P. -A.
Gallivan, K. A.
[J]. IMA JOURNAL OF NUMERICAL ANALYSIS, 2008, 28 (04) : 665 - 689
[6] Generalizable Human Pose Triangulation
Bartol, Kristijan
Bojanic, David
Petkovic, Tomislav
Pribanic, Tomislav
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11018 - 11027
[7] Berthelot D, 2018, Arxiv, DOI arXiv:1807.07543
[8] Bhatnagar B.L., 2020, Advances in Neural Information Processing Systems
[9] Bogo F., 2016, P IEEE C COMP VIS PA, P4800
[10] Dynamic FAUST: Registering Human Bodies in Motion
Bogo, Federica
Romero, Javier
Pons-Moll, Gerard
Black, Michael J.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5573 - 5582

← 1 2 3 4 5 6 7 8 →