Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

被引:24
作者
Fu, Yuqian [1 ]
Wang, Chengrong [1 ]
Fu, Yanwei [2 ]
Wang, Yu-Xiong [3 ]
Bai, Cong [4 ]
Xue, Xiangyang [1 ]
Jiang, Yu-Gang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[4] Zhejiang Univ Technol, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
基金
中国国家自然科学基金;
关键词
One-shot Learning; Video Action Recognition; Embodied Agents;
D O I
10.1145/3343031.3351015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one-shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.
引用
收藏
页码:411 / 419
页数:9
相关论文
共 58 条
[1]  
Anderson Peter, 2018, ECCV
[2]  
[Anonymous], 2016, ECCV
[3]  
[Anonymous], 2016, COMMUN ACM
[4]  
[Anonymous], 2013, IEEE T PATTERN ANAL
[5]  
[Anonymous], 2017, ICCV
[6]  
[Anonymous], 2015, CVPR
[7]  
[Anonymous], 2015, CVPR
[8]  
[Anonymous], 2015, ICML
[9]  
[Anonymous], 2015, P IEEE INT C COMPUTE
[10]  
[Anonymous], 2011, COGSCI