ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

被引：245

作者：

Shridhar, Mohit ^{[1
]}

Thomason, Jesse ^{[1
]}

Gordon, Daniel ^{[1
]}

Bisk, Yonatan ^{[1
,2
,3
]}

Han, Winson ^{[3
]}

Mottaghi, Roozbeh ^{[1
,3
]}

Zettlemoyer, Luke ^{[1
]}

Fox, Dieter ^{[1
,4
]}

机构：

[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA

[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA USA

[3] Allen Inst AI, Seattle, WA USA

[4] NVIDIA, Santa Clara, CA USA

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. ALFRED includes long, compositional tasks with non-reversible state changes to shrink the gap between research benchmarks and real-world applications. ALFRED consists of expert demonstrations in interactive visual environments for 25k natural language directives. These directives contain both high-level goals like "Rinse off a mug and place it in the coffee maker." and low-level language instructions like "Walk to the coffee maker on the right." ALFRED tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets. We show that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.

引用

页码：10737 / 10746

页数：10

共 58 条

[1]

Anderson P., 2018, arXiv

[2] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J].

Anderson, Peter ;

Wu, Qi ;

Teney, Damien ;

Bruce, Jake ;

Johnson, Mark ;

Sunderhauf, Niko ;

Reid, Ian ;

Gould, Stephen ;

van den Hengel, Anton .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3674-3683

[3] Neural Module Networks [J].

Andreas, Jacob ;

Rohrbach, Marcus ;

Darrell, Trevor ;

Klein, Dan .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :39-48

[4]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00365

[5]

[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.495

[6]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01282

[7]

Artzi Yoav, 2013, T ASSOC COMPUT LING, V1, P49, DOI [10.1162/tacla00209, DOI 10.1162/TACLA00209]

[8]

Asai M, 2018, AAAI CONF ARTIF INTE, P6094

[9]

Beetz Michael., 2011, IEEE-RAS

[10]

Bisk Yonatan., 2016, NAACL

← 1 2 3 4 5 6 →