Habitat: A Platform for Embodied AI Research

被引:789
作者
Savva, Manolis [1 ,4 ]
Kadian, Abhishek [1 ]
Maksymets, Oleksandr [1 ]
Zhao, Yili [1 ]
Wijmans, Erik [1 ,2 ,3 ]
Jain, Bhavana [1 ]
Straub, Julian [2 ]
Liu, Jia [1 ]
Koltun, Vladlen [5 ]
Malik, Jitendra [1 ,6 ]
Parikh, Devi [1 ,3 ]
Batra, Dhruv [1 ,3 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
[2] Facebook Real Labs, Pittsburgh, PA USA
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
[4] Simon Fraser Univ, Burnaby, BC, Canada
[5] Intel Labs, Santa Clara, CA USA
[6] Univ Calif Berkeley, Berkeley, CA USA
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00943
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast - when rendering a scene from Matterport3D, it achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU. (ii) Habitat-API: a modular high-level library for end-to-end development of embodied AI algorithms - defining tasks (e.g. navigation, instruction following, question answering), configuring, training, and benchmarking embodied agents. These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or `merely' impractical. Specifically, in the context of point-goal navigation: (1) we revisit the comparison between learning and SLAM approaches from two recent works [19, 16] and find evidence for the opposite conclusion - that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test}.{Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI.
引用
收藏
页码:9338 / 9346
页数:9
相关论文
共 28 条
[1]  
Ammirato Phil, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1378, DOI 10.1109/ICRA.2017.7989164
[2]   Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J].
Anderson, Peter ;
Wu, Qi ;
Teney, Damien ;
Bruce, Jake ;
Johnson, Mark ;
Sunderhauf, Niko ;
Reid, Ian ;
Gould, Stephen ;
van den Hengel, Anton .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3674-3683
[3]  
Anderson Peter, 2018, ARXIV180706757
[4]  
[Anonymous], 2015, TUK CENT WORKSH
[5]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[6]   3D Semantic Parsing of Large-Scale Indoor Spaces [J].
Armeni, Iro ;
Sener, Ozan ;
Zamir, Amir R. ;
Jiang, Helen ;
Brilakis, Ioannis ;
Fischer, Martin ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543
[7]  
Bewley A, 2019, IEEE INT CONF ROBOT, P4818, DOI [10.1109/ICRA.2019.8793668, 10.1109/icra.2019.8793668]
[8]  
Brodeur S., 2017, ARXIV171111017
[9]  
Chang A., 2017, INT C 3D VIS 3DV
[10]  
Das Abhishek, 2018, P IEEE C COMP VIS PA