Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

被引:105
作者
Rosinol, Antoni [1 ]
Violette, Andrew [1 ]
Abate, Marcus [1 ]
Hughes, Nathan [1 ]
Chang, Yun [1 ]
Shi, Jingnan [1 ]
Gupta, Arjun [1 ]
Carlone, Luca [1 ]
机构
[1] MIT, Lab Informat & Decis Syst, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
Localization; mapping; slam; sensing and perception; computer vision; SIMULTANEOUS LOCALIZATION; MOTION; TRACKING; RECOGNITION; ALGORITHMS; ROBUST; RECONSTRUCTION; SEGMENTATION; PEOPLE; MAPS;
D O I
10.1177/02783649211056674
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, and voxels), or as a collection of objects. This article attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D dynamic scene graph (DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatiotemporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes accurate algorithms for visual-inertial simultaneous localization and mapping (SLAM), metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves competitive performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution is to showcase how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera have been released open source.
引用
收藏
页码:1510 / 1546
页数:37
相关论文
共 203 条
  • [1] Abdulla W., 2017, Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
  • [2] Aldoma A, 2013, IEEE INT CONF ROBOT, P2104, DOI 10.1109/ICRA.2013.6630859
  • [3] Alzantot M., 2012, P 2012 INT C ADV GEO, P99, DOI DOI 10.1145/2424321.2424335
  • [4] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [5] People-tracking-by-detection and people-detection-by-tracking
    Andriluka, Mykhaylo
    Roth, Stefan
    Schiele, Bernt
    [J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1873 - 1880
  • [6] Monocular 3D Pose Estimation and Tracking by Detection
    Andriluka, Mykhaylo
    Roth, Stefan
    Schiele, Bernt
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 623 - 630
  • [7] [Anonymous], 2019, IEEE INT CONF ROBOT
  • [8] [Anonymous], 2018, ROBOTICS SCI SYSTEMS
  • [9] 3D Scene Graph: A structure for unified semantics, 3D space, and camera
    Armeni, Iro
    He, Zhi-Yang
    Gwak, JunYoung
    Zamir, Amir R.
    Fischer, Martin
    Malik, Jitendra
    Savarese, Silvio
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5663 - 5672
  • [10] 3D Semantic Parsing of Large-Scale Indoor Spaces
    Armeni, Iro
    Sener, Ozan
    Zamir, Amir R.
    Jiang, Helen
    Brilakis, Ioannis
    Fischer, Martin
    Savarese, Silvio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543