Zero-Shot Learning on Human-Object Interaction Recognition in video

被引:0
作者
Maraghi, Vali Ollah [1 ]
Faez, Karim [1 ]
机构
[1] Amirkabir Univ Technol, Tehran Polytech, Dept Elect Engn, Tehran, Iran
来源
2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019) | 2019年
关键词
Human-object interaction; video understanding; recurrent neural network; action recognition; zero-shot learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of human activities is an essential field in computer vision. Much of human activities consist of human-object interaction (HOD. A lot of successful works has done on HOI recognition and achieved acceptable results, but they are fully supervised and need to training labeled data for all HOIs. The space of possible human-object interactions is huge, and listing and providing training data for all categories is costly and impractical. We tackle this problem by proposing an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique. Our method recognizes a verb and an object from video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows the identification of a new combination of verb an object as a new HOI class that not seen by the recognizer model. We introduce a neural network architecture that can understand video data. The proposed model learns verbs and objects from available training data at the training phase, and at test time can detect the pairs of verb and object in a video, and so identify the HOI class. We evaluated our model by recently introduced charades dataset which has lots of HOI categories in videos. We show that our model can detect unseen HOI classes in addition to the acceptable recognition of seen types. And so more significant number categories are identifiable than the number of training classes.
引用
收藏
页数:7
相关论文
共 41 条
[1]   How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes [J].
Al-Halah, Ziad ;
Stiefelhagen, Rainer .
2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, :837-843
[2]  
[Anonymous], 2010, ECCV
[3]  
[Anonymous], 2014, Zero-shot learning by convex combination of semantic embeddings
[4]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.649
[5]  
[Anonymous], 2015, PROC CVPR IEEE
[6]   Recycle-GAN: Unsupervised Video Retargeting [J].
Bansal, Aayush ;
Ma, Shugao ;
Ramanan, Deva ;
Sheikh, Yaser .
COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 :122-138
[7]   Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication [J].
Bucher, Maxime ;
Herbin, Stephane ;
Jurie, Frederic .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :730-746
[8]   Learning to Detect Human-Object Interactions [J].
Chao, Yu-Wei ;
Liu, Yunfan ;
Liu, Xieyang ;
Zeng, Huayi ;
Deng, Jia .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389
[9]   HICO: A Benchmark for Recognizing Human-Object Interactions in Images [J].
Chao, Yu-Wei ;
Wang, Zhan ;
He, Yugeng ;
Wang, Jiaxuan ;
Deng, Jia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1017-1025
[10]  
Delaitre V, 2012, LECT NOTES COMPUT SC, V7577, P284, DOI 10.1007/978-3-642-33783-3_21