Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

被引：105

作者：

Rosinol, Antoni ^{[1
]}

Violette, Andrew ^{[1
]}

Abate, Marcus ^{[1
]}

Hughes, Nathan ^{[1
]}

Chang, Yun ^{[1
]}

Shi, Jingnan ^{[1
]}

Gupta, Arjun ^{[1
]}

Carlone, Luca ^{[1
]}

机构：

[1] MIT, Lab Informat & Decis Syst, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH | 2021年 / 40卷 / 12-14期

关键词：

Localization; mapping; slam; sensing and perception; computer vision; SIMULTANEOUS LOCALIZATION; MOTION; TRACKING; RECOGNITION; ALGORITHMS; ROBUST; RECONSTRUCTION; SEGMENTATION; PEOPLE; MAPS;

D O I：

10.1177/02783649211056674

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, and voxels), or as a collection of objects. This article attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D dynamic scene graph (DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatiotemporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes accurate algorithms for visual-inertial simultaneous localization and mapping (SLAM), metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves competitive performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution is to showcase how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera have been released open source.

引用

页码：1510 / 1546

页数：37

共 203 条

[1] Abdulla W., 2017, Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
[2] Aldoma A, 2013, IEEE INT CONF ROBOT, P2104, DOI 10.1109/ICRA.2013.6630859
[3] Alzantot M., 2012, P 2012 INT C ADV GEO, P99, DOI DOI 10.1145/2424321.2424335
[4] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[5] People-tracking-by-detection and people-detection-by-tracking
Andriluka, Mykhaylo
Roth, Stefan
Schiele, Bernt
[J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1873 - 1880
[6] Monocular 3D Pose Estimation and Tracking by Detection
Andriluka, Mykhaylo
Roth, Stefan
Schiele, Bernt
[J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 623 - 630
[7] [Anonymous], 2019, IEEE INT CONF ROBOT
[8] [Anonymous], 2018, ROBOTICS SCI SYSTEMS
[9] 3D Scene Graph: A structure for unified semantics, 3D space, and camera
Armeni, Iro
He, Zhi-Yang
Gwak, JunYoung
Zamir, Amir R.
Fischer, Martin
Malik, Jitendra
Savarese, Silvio
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5663 - 5672
[10] 3D Semantic Parsing of Large-Scale Indoor Spaces
Armeni, Iro
Sener, Ozan
Zamir, Amir R.
Jiang, Helen
Brilakis, Ioannis
Fischer, Martin
Savarese, Silvio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543

← 1 2 3 4 5 6 7 8 9 10 →