LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

被引：18

作者：

Jia, Baoxiong ^{[1
]}

Chen, Yixin ^{[1
]}

Huang, Siyuan ^{[1
]}

Zhu, Yixin ^{[1
]}

Zhu, Song-Chun ^{[1
]}

机构：

[1] UCLA, Ctr Vis Cognit Learning & Auton VCLA, Los Angeles, CA 90095 USA

来源：

COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷

关键词：

Dataset; Multi-agent multi-task activities; Compositional action recognition; Action and task anticipations; Multiview; OBJECT;

D O I：

10.1007/978-3-030-58574-7_46

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding and interpreting human actions is a longstanding challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.

引用

页码：767 / 786

页数：20

共 65 条

[31]

Kleiman-Weiner M., 2016, P ANN M COGN SCI SOC

[32] Learning human activities and object affordances from RGB-D videos [J].

Koppula, Hema Swetha ;

Gupta, Rudhir ;

Saxena, Ashutosh .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (08) :951-970

[33] The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [J].

Kuehne, Hilde ;

Arslan, Ali ;

Serre, Thomas .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :780-787

[34] The roles of vision and eye movements in the control of activities of daily living [J].

Land, M ;

Mennie, N ;

Rusted, J .

PERCEPTION, 1999, 28 (11) :1311-1328

[35] Crowds by example [J].

Lerner, Alon ;

Chrysanthou, Yiorgos ;

Lischinski, Dani .

COMPUTER GRAPHICS FORUM, 2007, 26 (03) :655-664

[36] RESOUND: Towards Action Recognition Without Representation Bias [J].

Li, Yingwei ;

Li, Yi ;

Vasconcelos, Nuno .

COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :520-535

[37] Moments in Time Dataset: One Million Videos for Event Understanding [J].

Monfort, Mathew ;

Andonian, Alex ;

Zhou, Bolei ;

Ramakrishnan, Kandan ;

Bargal, Sarah Adel ;

Yan, Tom ;

Brown, Lisa ;

Fan, Quanfu ;

Gutfruend, Dan ;

Vondrick, Carl ;

Oliva, Aude .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) :502-508

[38] Task switching [J].

Monsell, S .

TRENDS IN COGNITIVE SCIENCES, 2003, 7 (03) :134-140

[39]

Oh SM, 2011, PROC CVPR IEEE

[40] You'll Never Walk Alone: Modeling Social Behavior for Multi-target Tracking [J].

Pellegrini, S. ;

Ess, A. ;

Schindler, K. ;

van Gool, L. .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :261-268

← 1 2 3 4 5 6 7 →