Compositional Learning for Human Object Interaction

被引:78
作者
Kato, Keizo [1 ]
Li, Yin [2 ]
Gupta, Abhinav [2 ]
机构
[1] Fujitsu Labs Ltd, Kawasaki, Kanagawa, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷
关键词
RECOGNITION; LANGUAGE; WORDNET;
D O I
10.1007/978-3-030-01264-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The world of human-object interactions is rich. While generally we sit on chairs and sofas, if need be we can even sit on TVs or top of shelves. In recent years, there has been progress in modeling actions and human-object interactions. However, most of these approaches require lots of data. It is not clear if the learned representations of actions are generalizable to new categories. In this paper, we explore the problem of zero-shot learning of human-object interactions. Given limited verb-noun interactions in training data, we want to learn a model than can work even on unseen combinations. To deal with this problem, In this paper, we propose a novel method using external knowledge graph and graph convolutional networks which learns how to compose classifiers for verbnoun pairs. We also provide benchmarks on several dataset for zero-shot learning including both image and video. We hope our method, dataset and baselines will facilitate future research in this direction.
引用
收藏
页码:247 / 264
页数:18
相关论文
共 60 条
[31]   Action Tubelet Detector for Spatio-Temporal Action Localization [J].
Kalogeiton, Vicky ;
Weinzaepfel, Philippe ;
Ferrari, Vittorio ;
Schmid, Cordelia .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4415-4423
[32]   Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].
Krishna, Ranjay ;
Zhu, Yuke ;
Groth, Oliver ;
Johnson, Justin ;
Hata, Kenji ;
Kravitz, Joshua ;
Chen, Stephanie ;
Kalantidis, Yannis ;
Li, Li-Jia ;
Shamma, David A. ;
Bernstein, Michael S. ;
Li Fei-Fei .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73
[33]  
Lampert CH, 2009, PROC CVPR IEEE, P951, DOI 10.1109/CVPRW.2009.5206594
[34]  
Leacock C, 1998, COMPUT LINGUIST, V24, P147
[35]   Semi-Supervised Zero-Shot Classification with Label Representation Learning [J].
Li, Xin ;
Guo, Yuhong ;
Schuurmans, Dale .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4211-4219
[36]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[37]   Visual Relationship Detection with Language Priors [J].
Lu, Cewu ;
Krishna, Ranjay ;
Bernstein, Michael ;
Li Fei-Fei .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :852-869
[38]   Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images [J].
Mao, Junhua ;
Xu, Wei ;
Yang, Yi ;
Wang, Jiang ;
Huang, Zhiheng ;
Yuille, Alan L. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2533-2541
[39]  
Mikolov T., 2013, P 26 INT C NEURAL IN, P3111
[40]   WORDNET - A LEXICAL DATABASE FOR ENGLISH [J].
MILLER, GA .
COMMUNICATIONS OF THE ACM, 1995, 38 (11) :39-41