Scaling Human-Object Interaction Recognition through Zero-Shot Learning

被引:109
作者
Shen, Liyue [1 ]
Yeung, Serena [1 ]
Hoffman, Judy [2 ]
Mori, Greg [3 ]
Li Fei-Fei [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Calif Berkeley, Berkeley, CA USA
[3] Simon Fraser Univ, Burnaby, BC, Canada
来源
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018) | 2018年
关键词
D O I
10.1109/WACV.2018.00181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing human object interactions (HOI) is an important part of distinguishing the rich variety of human action in the visual world. While recent progress has been made in improving HOI recognition in the fully supervised setting, the space of possible human-object interactions is large and it is impractical to obtain labeled training data for all interactions of interest. In this work, we tackle the challenge of scaling HOI recognition to the long tail of categories through a zero-shot learning approach. We introduce a factorized model for HOI detection that disentangles reasoning on verbs and objects, and at test-time can therefore produce detections for novel verb-object pairs. We present experiments on the recently introduced large-scale HICO-DET dataset, and show that our model is able to both perform comparably to state-of-the-art in fully-supervised HOI detection, while simultaneously achieving effective zero-shot detection of new HOI categories.
引用
收藏
页码:1568 / 1576
页数:9
相关论文
共 27 条
[1]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[2]   How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes [J].
Al-Halah, Ziad ;
Stiefelhagen, Rainer .
2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, :837-843
[3]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.649
[4]  
[Anonymous], 2015, PROC CVPR IEEE
[5]   Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication [J].
Bucher, Maxime ;
Herbin, Stephane ;
Jurie, Frederic .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :730-746
[6]  
Cao Z., 2016, ARXIV161108050
[7]   HICO: A Benchmark for Recognizing Human-Object Interactions in Images [J].
Chao, Yu-Wei ;
Wang, Zhan ;
He, Yugeng ;
Wang, Jiaxuan ;
Deng, Jia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1017-1025
[8]  
Delaitre V., 2011, NIPS, P1503
[9]  
Desai C, 2012, LECT NOTES COMPUT SC, V7575, P158, DOI 10.1007/978-3-642-33765-9_12
[10]  
Girshick R., 2014, P IEEE C COMP VIS PA, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81, 10.1109/cvpr.2014.81]