LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

被引：18

作者：

Jia, Baoxiong ^{[1
]}

Chen, Yixin ^{[1
]}

Huang, Siyuan ^{[1
]}

Zhu, Yixin ^{[1
]}

Zhu, Song-Chun ^{[1
]}

机构：

[1] UCLA, Ctr Vis Cognit Learning & Auton VCLA, Los Angeles, CA 90095 USA

来源：

COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷

关键词：

Dataset; Multi-agent multi-task activities; Compositional action recognition; Action and task anticipations; Multiview; OBJECT;

D O I：

10.1007/978-3-030-58574-7_46

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding and interpreting human actions is a longstanding challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.

引用

页码：767 / 786

页数：20

共 65 条

[1]

Sigurdsson GA, 2018, Arxiv, DOI arXiv:1804.09626

[2] Unsupervised Learning from Narrated Instruction Videos [J].

Alayrac, Jean-Baptiste ;

Bojanowski, Piotr ;

Agrawal, Nishant ;

Sivic, Josef ;

Laptev, Ivan ;

Lacoste-Julien, Simon .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4575-4583

[3] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[4]

Baker B., 2020, P INT C LEARN REPR I, P1

[5] Infants parse dynamic action [J].

Baldwin, DA ;

Baird, JA ;

Saylor, MM ;

Clark, MA .

CHILD DEVELOPMENT, 2001, 72 (03) :708-717

[6] Discerning intentions in dynamic human action [J].

Baldwin, DA ;

Baird, JA .

TRENDS IN COGNITIVE SCIENCES, 2001, 5 (04) :171-178

[7]

Berner Christopher, 2019, arXiv

[8]

Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41

[9]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[10] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

← 1 2 3 4 5 6 7 →