ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

被引：280

作者：

Shridhar, Mohit ^{[1
]}

Thomason, Jesse ^{[1
]}

Gordon, Daniel ^{[1
]}

Bisk, Yonatan ^{[1
,2
,3
]}

Han, Winson ^{[3
]}

Mottaghi, Roozbeh ^{[1
,3
]}

Zettlemoyer, Luke ^{[1
]}

Fox, Dieter ^{[1
,4
]}

机构：

[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA

[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA USA

[3] Allen Inst AI, Seattle, WA USA

[4] NVIDIA, Santa Clara, CA USA

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. ALFRED includes long, compositional tasks with non-reversible state changes to shrink the gap between research benchmarks and real-world applications. ALFRED consists of expert demonstrations in interactive visual environments for 25k natural language directives. These directives contain both high-level goals like "Rinse off a mug and place it in the coffee maker." and low-level language instructions like "Walk to the coffee maker on the right." ALFRED tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets. We show that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.

引用

页码：10737 / 10746

页数：10

共 58 条

[31] The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation [J].

Ma, Chih-Yao ;

Wu, Zuxuan ;

AlRegib, Ghassan ;

Xiong, Caiming ;

Kira, Zsolt .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6725-6733

[32]

Ma Chih-Yao, 2019, P INT C LEARN REPR

[33]

MacGlashan James, 2015, RSS

[34]

MacMahon M., 2006, AAAI

[35]

Malmaud Jonathan, 2014, ACL WORKSH SEM PARS

[36]

Misra D., 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, P1004, DOI 10.18653/v1/D17-1106

[37]

Misra Dipendra, 2018, EMNLP

[38]

Misra Dipendra Kumar, 2015, ACL

[39]

Mousavian A., 2019, ICCV

[40]

Nyga Daniel, 2018, CORL

← 1 2 3 4 5 6 →