ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling

被引:0
作者
Jang, Deok-kyeong [1 ]
Yang, Dongseok [1 ]
Jang, Deok-yun [1 ]
Choi, Byeoli [1 ]
Shin, Donghoon [1 ]
Lee, Sung-hee [2 ]
机构
[1] MOVIN Inc, Daejeon 51543, South Korea
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 06期
关键词
Motion capture; Motion synthesis; Character animation; Point cloud; Deep learning;
D O I
10.1145/3687991
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/
引用
收藏
页数:14
相关论文
共 67 条
[1]   FLAG: Flow-based 3D Avatar Generation from Sparse Observations [J].
Aliakbarian, Sadegh ;
Cameron, Pashmina ;
Bogo, Federica ;
Fitzgibbon, Andrew ;
Cashman, Thomas J. .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13243-13252
[2]   Multi-view Pictorial Structures for 3D Human Pose Estimation [J].
Amin, Sikandar ;
Andriluka, Mykhaylo ;
Rohrbach, Marcus ;
Schiele, Bernt .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[3]  
Baak A, 2011, IEEE I CONF COMP VIS, P1092, DOI 10.1109/ICCV.2011.6126356
[4]  
Bazarevsky V, 2020, Arxiv, DOI [arXiv:2006.10204, DOI 10.48550/ARXIV.2006.10204]
[5]  
Bogo F, 2016, Arxiv, DOI arXiv:1607.08128
[6]   Tracking people with twists and exponential maps [J].
Bregler, C ;
Malik, J .
1998 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1998, :8-15
[7]   3D Pictorial Structures for Multiple View Articulated Pose Estimation [J].
Burenius, Magnus ;
Sullivan, Josephine ;
Carlsson, Stefan .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3618-3625
[8]  
Chai Jinxiang, 2010, ACM SIGGRAPH 2010 papers, P1
[9]  
Cong PS, 2023, AAAI CONF ARTIF INTE, P461
[10]   Performance capture from sparse multi-view video [J].
de Aguiar, Edilson ;
Stoll, Carsten ;
Theobalt, Christian ;
Ahmed, Naveed ;
Seidel, Hans-Peter ;
Thrun, Sebastian .
ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (03)