BundleMoCap plus plus : Efficient, robust and smooth motion capture from sparse multiview videos

被引：0

作者：

Albanis, Georgios ^{[1
,2
]}

Zioulis, Nikolaos ^{[2
]}

Kolomvatsos, Kostas ^{[1
]}

机构：

[1] Univ Thessaly, Dept Informat & Telecommun, Lamia, Greece

[2] Moverse, Thessaloniki, Greece

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 249卷

关键词：

Markerless motion capture; Human pose estimation; Human pose prior; RIEMANNIAN-MANIFOLDS; OPTIMIZATION;

D O I：

10.1016/j.cviu.2024.104190

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Producing smooth and accurate motions from sparse videos without requiring specialized equipment and markers is a long-standing problem in the research community. Most approaches typically involve complex processes such as temporal constraints, multiple stages combining data-driven regression and optimization techniques, and bundle solving over temporal windows. These increase the computational burden and introduce the challenge of hyperparameter tuning for the different objective terms. In contrast, BundleMoCap++ offers a simple yet effective approach to this problem. It solves the motion in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions without compromising accuracy. BundleMoCap++ outperforms the state-of-the-art without increasing complexity. Our approach is based on manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption and appropriate interpolation schemes, we efficiently solve a bundle of frames using two or more latent codes. Additionally, the method is implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap++'s strength lies in achieving high-quality motion capture results with fewer computational resources. To do this efficiently, we propose a novel human pose prior that focuses on the geometric aspect of the latent space, modeling it as a hypersphere, allowing for the introduction of sophisticated interpolation techniques. We also propose an algorithm for optimizing the latent variables directly on the learned manifold, improving convergence and performance. Finally, we introduce high-order interpolation techniques adapted for the hypersphere, allowing us to increase the solving temporal window, enhancing performance and efficiency.

引用

页数：15

共 80 条

[71] Decoupling Human and Camera Motion from Videos in the Wild
Ye, Vickie
Pavlakos, Georgios
Malik, Jitendra
Kanazawa, Angjoo
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21222 - 21232
[72] Yeh RY, 2016, Arxiv, DOI arXiv:1611.09961
[73] Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes The Importance of Multiple Scene Constraints
Zanfir, Andrei
Marinoiu, Elisabeta
Sminchisescu, Cristian
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2148 - 2157
[74] SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
Zeng, Ailing
Yang, Lei
Ju, Xuan
Li, Jiefeng
Wang, Jianyi
Xu, Qiang
[J]. COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 625 - 642
[75] NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions
Zhang, Juze
Luo, Haimin
Yang, Hongdi
Xu, Xinru
Wu, Qianyang
Shi, Ye
Yu, Jingyi
Xu, Lan
Wang, Jingya
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8834 - 8845
[76] Learning Motion Priors for 4D Human Body Capture in 3D Scenes
Zhang, Siwei
Zhang, Yan
Bogo, Federica
Pollefeys, Marc
Tang, Siyu
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11323 - 11333
[77] MoSculp: Interactive Visualization of Shape and Time
Zhang, Xiuming
Dekel, Tali
Xue, Tianfan
Owens, Andrew
He, Qiurui
Wu, Jiajun
Mueller, Stefanie
Freeman, William T.
[J]. UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 275 - 285
[78] Zhang Y., 2020, P EUR C COMP VIS
[79] Zhao D., 2019, ARXIV
[80] HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs
Zhao, Fuqiang
Yang, Wei
Zhang, Jiakai
Lin, Pei
Zhang, Yingliang
Yu, Jingyi
Xu, Lan
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7733 - 7743

← 1 2 3 4 5 6 7 8 →