ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling

被引：0

作者：

Jang, Deok-kyeong ^{[1
]}

Yang, Dongseok ^{[1
]}

Jang, Deok-yun ^{[1
]}

Choi, Byeoli ^{[1
]}

Shin, Donghoon ^{[1
]}

Lee, Sung-hee ^{[2
]}

机构：

[1] MOVIN Inc, Daejeon 51543, South Korea

[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea

来源：

ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 06期

关键词：

Motion capture; Motion synthesis; Character animation; Point cloud; Deep learning;

D O I：

10.1145/3687991

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/

引用

页数：14

共 67 条

[1] FLAG: Flow-based 3D Avatar Generation from Sparse Observations [J].

Aliakbarian, Sadegh ;

Cameron, Pashmina ;

Bogo, Federica ;

Fitzgibbon, Andrew ;

Cashman, Thomas J. .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13243-13252

[2] Multi-view Pictorial Structures for 3D Human Pose Estimation [J].

Amin, Sikandar ;

Andriluka, Mykhaylo ;

Rohrbach, Marcus ;

Schiele, Bernt .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,

[3]

Baak A, 2011, IEEE I CONF COMP VIS, P1092, DOI 10.1109/ICCV.2011.6126356

[4]

Bazarevsky V, 2020, Arxiv, DOI [arXiv:2006.10204, DOI 10.48550/ARXIV.2006.10204]

[5]

Bogo F, 2016, Arxiv, DOI arXiv:1607.08128

[6] Tracking people with twists and exponential maps [J].

Bregler, C ;

Malik, J .

1998 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1998, :8-15

[7] 3D Pictorial Structures for Multiple View Articulated Pose Estimation [J].

Burenius, Magnus ;

Sullivan, Josephine ;

Carlsson, Stefan .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3618-3625

[8]

Chai Jinxiang, 2010, ACM SIGGRAPH 2010 papers, P1

[9]

Cong PS, 2023, AAAI CONF ARTIF INTE, P461

[10] Performance capture from sparse multi-view video [J].

de Aguiar, Edilson ;

Stoll, Carsten ;

Theobalt, Christian ;

Ahmed, Naveed ;

Seidel, Hans-Peter ;

Thrun, Sebastian .

ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (03)

← 1 2 3 4 5 6 7 →