ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

被引:204
作者
Shridhar, Mohit [1 ]
Thomason, Jesse [1 ]
Gordon, Daniel [1 ]
Bisk, Yonatan [1 ,2 ,3 ]
Han, Winson [3 ]
Mottaghi, Roozbeh [1 ,3 ]
Zettlemoyer, Luke [1 ]
Fox, Dieter [1 ,4 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA USA
[3] Allen Inst AI, Seattle, WA USA
[4] NVIDIA, Santa Clara, CA USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. ALFRED includes long, compositional tasks with non-reversible state changes to shrink the gap between research benchmarks and real-world applications. ALFRED consists of expert demonstrations in interactive visual environments for 25k natural language directives. These directives contain both high-level goals like "Rinse off a mug and place it in the coffee maker." and low-level language instructions like "Walk to the coffee maker on the right." ALFRED tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets. We show that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.
引用
收藏
页码:10737 / 10746
页数:10
相关论文
共 58 条
  • [31] Ma Chih-Yao, 2019, ICLR
  • [32] MacGlashan James, 2015, RSS
  • [33] MacMahon Matt, 2006, AAAI
  • [34] Malmaud Jonathan, 2014, ACL WORKSH SEM PARS
  • [35] Misra Dipendra, 2015, ACL
  • [36] Misra Dipendra, 2014, Robotics: Science and Systems
  • [37] Misra Dipendra, 2018, EMNLP
  • [38] Misra Dipendra, 2017, P C EMP METH NAT LAN, DOI DOI 10.18653/V1/D17-1106
  • [39] Mousavian Arsalan, 2019, ICCV
  • [40] Nyga Daniel, 2018, CORL