BundleMoCap plus plus : Efficient, robust and smooth motion capture from sparse multiview videos

被引:0
作者
Albanis, Georgios [1 ,2 ]
Zioulis, Nikolaos [2 ]
Kolomvatsos, Kostas [1 ]
机构
[1] Univ Thessaly, Dept Informat & Telecommun, Lamia, Greece
[2] Moverse, Thessaloniki, Greece
关键词
Markerless motion capture; Human pose estimation; Human pose prior; RIEMANNIAN-MANIFOLDS; OPTIMIZATION;
D O I
10.1016/j.cviu.2024.104190
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Producing smooth and accurate motions from sparse videos without requiring specialized equipment and markers is a long-standing problem in the research community. Most approaches typically involve complex processes such as temporal constraints, multiple stages combining data-driven regression and optimization techniques, and bundle solving over temporal windows. These increase the computational burden and introduce the challenge of hyperparameter tuning for the different objective terms. In contrast, BundleMoCap++ offers a simple yet effective approach to this problem. It solves the motion in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions without compromising accuracy. BundleMoCap++ outperforms the state-of-the-art without increasing complexity. Our approach is based on manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption and appropriate interpolation schemes, we efficiently solve a bundle of frames using two or more latent codes. Additionally, the method is implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap++'s strength lies in achieving high-quality motion capture results with fewer computational resources. To do this efficiently, we propose a novel human pose prior that focuses on the geometric aspect of the latent space, modeling it as a hypersphere, allowing for the introduction of sophisticated interpolation techniques. We also propose an algorithm for optimizing the latent variables directly on the learned manifold, improving convergence and performance. Finally, we introduce high-order interpolation techniques adapted for the hypersphere, allowing us to increase the solving temporal window, enhancing performance and efficiency.
引用
收藏
页数:15
相关论文
共 80 条
  • [71] Decoupling Human and Camera Motion from Videos in the Wild
    Ye, Vickie
    Pavlakos, Georgios
    Malik, Jitendra
    Kanazawa, Angjoo
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21222 - 21232
  • [72] Yeh RY, 2016, Arxiv, DOI arXiv:1611.09961
  • [73] Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes The Importance of Multiple Scene Constraints
    Zanfir, Andrei
    Marinoiu, Elisabeta
    Sminchisescu, Cristian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2148 - 2157
  • [74] SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
    Zeng, Ailing
    Yang, Lei
    Ju, Xuan
    Li, Jiefeng
    Wang, Jianyi
    Xu, Qiang
    [J]. COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 625 - 642
  • [75] NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions
    Zhang, Juze
    Luo, Haimin
    Yang, Hongdi
    Xu, Xinru
    Wu, Qianyang
    Shi, Ye
    Yu, Jingyi
    Xu, Lan
    Wang, Jingya
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8834 - 8845
  • [76] Learning Motion Priors for 4D Human Body Capture in 3D Scenes
    Zhang, Siwei
    Zhang, Yan
    Bogo, Federica
    Pollefeys, Marc
    Tang, Siyu
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11323 - 11333
  • [77] MoSculp: Interactive Visualization of Shape and Time
    Zhang, Xiuming
    Dekel, Tali
    Xue, Tianfan
    Owens, Andrew
    He, Qiurui
    Wu, Jiajun
    Mueller, Stefanie
    Freeman, William T.
    [J]. UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 275 - 285
  • [78] Zhang Y., 2020, P EUR C COMP VIS
  • [79] Zhao D., 2019, ARXIV
  • [80] HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs
    Zhao, Fuqiang
    Yang, Wei
    Zhang, Jiakai
    Lin, Pei
    Zhang, Yingliang
    Yu, Jingyi
    Xu, Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7733 - 7743