LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

被引:18
作者
Jia, Baoxiong [1 ]
Chen, Yixin [1 ]
Huang, Siyuan [1 ]
Zhu, Yixin [1 ]
Zhu, Song-Chun [1 ]
机构
[1] UCLA, Ctr Vis Cognit Learning & Auton VCLA, Los Angeles, CA 90095 USA
来源
COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷
关键词
Dataset; Multi-agent multi-task activities; Compositional action recognition; Action and task anticipations; Multiview; OBJECT;
D O I
10.1007/978-3-030-58574-7_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding and interpreting human actions is a longstanding challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
引用
收藏
页码:767 / 786
页数:20
相关论文
共 65 条
[31]  
Kleiman-Weiner M., 2016, P ANN M COGN SCI SOC
[32]   Learning human activities and object affordances from RGB-D videos [J].
Koppula, Hema Swetha ;
Gupta, Rudhir ;
Saxena, Ashutosh .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (08) :951-970
[33]   The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [J].
Kuehne, Hilde ;
Arslan, Ali ;
Serre, Thomas .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :780-787
[34]   The roles of vision and eye movements in the control of activities of daily living [J].
Land, M ;
Mennie, N ;
Rusted, J .
PERCEPTION, 1999, 28 (11) :1311-1328
[35]   Crowds by example [J].
Lerner, Alon ;
Chrysanthou, Yiorgos ;
Lischinski, Dani .
COMPUTER GRAPHICS FORUM, 2007, 26 (03) :655-664
[36]   RESOUND: Towards Action Recognition Without Representation Bias [J].
Li, Yingwei ;
Li, Yi ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :520-535
[37]   Moments in Time Dataset: One Million Videos for Event Understanding [J].
Monfort, Mathew ;
Andonian, Alex ;
Zhou, Bolei ;
Ramakrishnan, Kandan ;
Bargal, Sarah Adel ;
Yan, Tom ;
Brown, Lisa ;
Fan, Quanfu ;
Gutfruend, Dan ;
Vondrick, Carl ;
Oliva, Aude .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) :502-508
[38]   Task switching [J].
Monsell, S .
TRENDS IN COGNITIVE SCIENCES, 2003, 7 (03) :134-140
[39]  
Oh SM, 2011, PROC CVPR IEEE
[40]   You'll Never Walk Alone: Modeling Social Behavior for Multi-target Tracking [J].
Pellegrini, S. ;
Ess, A. ;
Schindler, K. ;
van Gool, L. .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :261-268