Long-Term Human Trajectory Prediction Using 3D Dynamic Scene Graphs

被引：2

作者：

Gorlo, Nicolas ^{[1
]}

Schmid, Lukas ^{[1
]}

Carlone, Luca ^{[1
]}

机构：

[1] MIT, MIT SPARK Lab, Cambridge, MA 02139 USA

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 12期

基金：

芬兰科学院; 瑞士国家科学基金会;

关键词：

Trajectory; Probabilistic logic; Three-dimensional displays; Predictive models; Indoor environment; Planning; Cognition; Annotations; Service robots; Legged locomotion; AI-enabled robotics; human-centered robotics; service robotics; datasets for human motion; modeling and simulating humans; NAVIGATION;

D O I：

10.1109/LRA.2024.3482169

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s . We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of 60 s .

引用

页码：10978 / 10985

页数：8

共 49 条

[1]

2023, Arxiv, DOI arXiv:2303.08774

[2]

Amirian B., 2020, P AS C COMP VIS, P566

[3] 3D Scene Graph: A structure for unified semantics, 3D space, and camera [J].

Armeni, Iro ;

He, Zhi-Yang ;

Gwak, JunYoung ;

Zamir, Amir R. ;

Fischer, Martin ;

Malik, Jitendra ;

Savarese, Silvio .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5663-5672

[4]

Bae I., 2024, P IEEE C COMP VIS PA, P753

[5]

Brito BF, 2021, PMLR, P862

[6] Person Tracking in Large Public Spaces Using 3-D Range Sensors [J].

Brscic, Drazen ;

Kanda, Takayuki ;

Ikeda, Tetsushi ;

Miyashita, Takahiro .

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2013, 43 (06) :522-534

[7] Predicting human navigation goals based on Bayesian inference and activity regions [J].

Bruckschen, Lilli ;

Bungert, Kira ;

Dengler, Nils ;

Bennewitz, Maren .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2020, 134

[8] Long-Term Human Motion Prediction with Scene Context [J].

Cao, Zhe ;

Gao, Hang ;

Mangalam, Karttikeya ;

Cai, Qi-Zhi ;

Vo, Minh ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :387-404

[9]

Chib PS, 2024, Arxiv, DOI arXiv:2403.08032

[10] Sensor-based and vision-based human activity recognition: A comprehensive survey [J].

Dang, L. Minh ;

Min, Kyungbok ;

Wang, Hanxiang ;

Piran, Md. Jalil ;

Lee, Cheol Hee ;

Moon, Hyeonjoon .

PATTERN RECOGNITION, 2020, 108

← 1 2 3 4 5 →