A Survey of Embodied AI: From Simulators to Research Tasks

被引：148

作者：

Duan, Jiafei ^{[1
]}

Yu, Samson ^{[2
]}

Tan, Hui Li ^{[3
]}

Zhu, Hongyuan ^{[3
]}

Tan, Cheston ^{[3
]}

机构：

[1] Nanyang Technol Univ Singapore, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Singapore Univ Technol & Design, Singapore 487372, Singapore

[3] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Artificial intelligence; Task analysis; Navigation; Physics; Three-dimensional displays; Visualization; Solid modeling; Embodied AI; computer vision; 3D simulators; NAVIGATION;

D O I：

10.1109/TETCI.2022.3141105

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI," where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of Artificial General Intelligence (AGI), but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI - visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.

引用

页码：230 / 244

页数：15

共 132 条

[11] Learning to Plan with Uncertain Topological Maps [J].

Beeching, Edward ;

Dibangoye, Jilles ;

Simonin, Olivier ;

Wolf, Christian .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :473-490

[12] The Arcade Learning Environment: An Evaluation Platform for General Agents [J].

Bellemare, Marc G. ;

Naddaf, Yavar ;

Veness, Joel ;

Bowling, Michael .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279

[13]

Bhirangi R, 2021, PR MACH LEARN RES, V164, P587

[14] Visual navigation for mobile robots: A survey [J].

Bonin-Font, Francisco ;

Ortiz, Alberto ;

Oliver, Gabriel .

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2008, 53 (03) :263-296

[15]

Burda Y., 2018, P INT C LEARN REPR, P1

[16]

Burda Yuri, 2018, INT C LEARN REPR

[17] Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age [J].

Cadena, Cesar ;

Carlone, Luca ;

Carrillo, Henry ;

Latif, Yasir ;

Scaramuzza, Davide ;

Neira, Jose ;

Reid, Ian ;

Leonard, John J. .

IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (06) :1309-1332

[18] Exploiting Scene-Specific Features for Object Goal Navigation [J].

Campari, Tommaso ;

Eccher, Paolo ;

Serafini, Luciano ;

Ballan, Lamberto .

COMPUTER VISION - ECCV 2020 WORKSHOPS, PT IV, 2020, 12538 :406-421

[19] Matterport3D: Learning from RGB-D Data in Indoor Environments [J].

Chang, Angel ;

Dai, Angela ;

Funkhouser, Thomas ;

Halber, Maciej ;

Niessner, Matthias ;

Savva, Manolis ;

Song, Shuran ;

Zeng, Andy ;

Zhang, Yinda .

PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, :667-676

[20]

Chaplot D. S., 2020, Computer VisionECCV 2020: 16th European Conference, Glasgow, UK, August 2328, 2020, Proceedings, Part VI 16, P309

← 1 2 3 4 5 6 7 8 9 10 →