BundleMoCap plus plus : Efficient, robust and smooth motion capture from sparse multiview videos

被引:0
作者
Albanis, Georgios [1 ,2 ]
Zioulis, Nikolaos [2 ]
Kolomvatsos, Kostas [1 ]
机构
[1] Univ Thessaly, Dept Informat & Telecommun, Lamia, Greece
[2] Moverse, Thessaloniki, Greece
关键词
Markerless motion capture; Human pose estimation; Human pose prior; RIEMANNIAN-MANIFOLDS; OPTIMIZATION;
D O I
10.1016/j.cviu.2024.104190
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Producing smooth and accurate motions from sparse videos without requiring specialized equipment and markers is a long-standing problem in the research community. Most approaches typically involve complex processes such as temporal constraints, multiple stages combining data-driven regression and optimization techniques, and bundle solving over temporal windows. These increase the computational burden and introduce the challenge of hyperparameter tuning for the different objective terms. In contrast, BundleMoCap++ offers a simple yet effective approach to this problem. It solves the motion in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions without compromising accuracy. BundleMoCap++ outperforms the state-of-the-art without increasing complexity. Our approach is based on manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption and appropriate interpolation schemes, we efficiently solve a bundle of frames using two or more latent codes. Additionally, the method is implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap++'s strength lies in achieving high-quality motion capture results with fewer computational resources. To do this efficiently, we propose a novel human pose prior that focuses on the geometric aspect of the latent space, modeling it as a hypersphere, allowing for the introduction of sophisticated interpolation techniques. We also propose an algorithm for optimizing the latent variables directly on the learned manifold, improving convergence and performance. Finally, we introduce high-order interpolation techniques adapted for the hypersphere, allowing us to increase the solving temporal window, enhancing performance and efficiency.
引用
收藏
页数:15
相关论文
共 80 条
  • [1] Akhter I., 2012, P IEEE C COMP VIS PA, P4800
  • [2] Noise-in, Bias-out: Balanced and Real-time MoCap Solving
    Albanis, Georgios
    Zioulis, Nikolaos
    Thermos, Spyridon
    Chatzitofis, Anargyros
    Kolomvatsos, Kostas
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4239 - 4249
  • [3] BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos
    Albanis, Georgios
    Zioulis, Nikolaos
    Kolomvatsos, Kostas
    [J]. 20TH ACM SIGGRAPH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION, CVMP 2023, 2023,
  • [4] Exploiting temporal context for 3D human pose estimation in the wild
    Arnab, Anurag
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3390 - 3399
  • [5] An implicit trust-region method on Riemannian manifolds
    Baker, C. G.
    Absil, P. -A.
    Gallivan, K. A.
    [J]. IMA JOURNAL OF NUMERICAL ANALYSIS, 2008, 28 (04) : 665 - 689
  • [6] Generalizable Human Pose Triangulation
    Bartol, Kristijan
    Bojanic, David
    Petkovic, Tomislav
    Pribanic, Tomislav
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11018 - 11027
  • [7] Berthelot D, 2018, Arxiv, DOI arXiv:1807.07543
  • [8] Bhatnagar B.L., 2020, Advances in Neural Information Processing Systems
  • [9] Bogo F., 2016, P IEEE C COMP VIS PA, P4800
  • [10] Dynamic FAUST: Registering Human Bodies in Motion
    Bogo, Federica
    Romero, Javier
    Pons-Moll, Gerard
    Black, Michael J.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5573 - 5582