LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

被引:18
作者
Jia, Baoxiong [1 ]
Chen, Yixin [1 ]
Huang, Siyuan [1 ]
Zhu, Yixin [1 ]
Zhu, Song-Chun [1 ]
机构
[1] UCLA, Ctr Vis Cognit Learning & Auton VCLA, Los Angeles, CA 90095 USA
来源
COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷
关键词
Dataset; Multi-agent multi-task activities; Compositional action recognition; Action and task anticipations; Multiview; OBJECT;
D O I
10.1007/978-3-030-58574-7_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding and interpreting human actions is a longstanding challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
引用
收藏
页码:767 / 786
页数:20
相关论文
共 65 条
[1]  
Sigurdsson GA, 2018, Arxiv, DOI arXiv:1804.09626
[2]   Unsupervised Learning from Narrated Instruction Videos [J].
Alayrac, Jean-Baptiste ;
Bojanowski, Piotr ;
Agrawal, Nishant ;
Sivic, Josef ;
Laptev, Ivan ;
Lacoste-Julien, Simon .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4575-4583
[3]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[4]  
Baker B., 2020, P INT C LEARN REPR I, P1
[5]   Infants parse dynamic action [J].
Baldwin, DA ;
Baird, JA ;
Saylor, MM ;
Clark, MA .
CHILD DEVELOPMENT, 2001, 72 (03) :708-717
[6]   Discerning intentions in dynamic human action [J].
Baldwin, DA ;
Baird, JA .
TRENDS IN COGNITIVE SCIENCES, 2001, 5 (04) :171-178
[7]  
Berner Christopher, 2019, arXiv
[8]  
Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[9]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[10]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733