Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

被引：24

作者：

Fu, Yuqian ^{[1
]}

Wang, Chengrong ^{[1
]}

Fu, Yanwei ^{[2
]}

Wang, Yu-Xiong ^{[3
]}

Bai, Cong ^{[4
]}

Xue, Xiangyang ^{[1
]}

Jiang, Yu-Gang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

[4] Zhejiang Univ Technol, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年

基金：

中国国家自然科学基金;

关键词：

One-shot Learning; Video Action Recognition; Embodied Agents;

D O I：

10.1145/3343031.3351015

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one-shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.

引用

页码：411 / 419

页数：9

共 58 条

[1]

Anderson Peter, 2018, ECCV

[2]

[Anonymous], 2016, ECCV

[3]

[Anonymous], 2016, COMMUN ACM

[4]

[Anonymous], 2013, IEEE T PATTERN ANAL

[5]

[Anonymous], 2017, ICCV

[6]

[Anonymous], 2015, CVPR

[7]

[Anonymous], 2015, CVPR

[8]

[Anonymous], 2015, ICML

[9]

[Anonymous], 2015, P IEEE INT C COMPUTE

[10]

[Anonymous], 2011, COGSCI

← 1 2 3 4 5 6 →