Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification

被引:5
作者
Hong, Mingyao [1 ]
Zhang, Xinfeng [1 ]
Li, Guorong [1 ]
Huang, Qingming [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
关键词
Visualization; Semantics; Task analysis; Training; Generative adversarial networks; Feature extraction; Data models; Zero-shot learning; feature generation; video classification;
D O I
10.1109/TIP.2023.3247167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generalized zero-shot video classification aims to train a classifier to classify videos including both seen and unseen classes. Since the unseen videos have no visual information during training, most existing methods rely on the generative adversarial networks to synthesize visual features for unseen classes through the class embedding of category names. However, most category names only describe the content of the video, ignoring other relational information. As a rich information carrier, videos include actions, performers, environments, etc., and the semantic description of the videos also express the events from different levels of actions. In order to use fully explore the video information, we propose a fine-grained feature generation model based on video category name and its corresponding description texts for generalized zero-shot video classification. To obtain comprehensive information, we first extract content information from coarse-grained semantic information (category names) and motion information from fine-grained semantic information (description texts) as the base for feature synthesis. Then, we subdivide motion into hierarchical constraints on the fine-grained correlation between event and action from the feature level. In addition, we propose a loss that can avoid the imbalance of positive and negative examples to constrain the consistency of features at each level. In order to prove the validity of our proposed framework, we perform extensive quantitative and qualitative evaluations on two challenging datasets: UCF101 and HMDB51, and obtain a positive gain for the task of generalized zero-shot video classification.
引用
收藏
页码:1599 / 1612
页数:14
相关论文
共 83 条
[1]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[2]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[3]  
[Anonymous], 2007, P 15 ACM INT C MULT, DOI DOI 10.1145/1291233.1291245
[4]  
[Anonymous], 2000, VSAM final report
[5]  
Arjovsky M., 2017, Towards principled methods for training generative adversarial networks, DOI 10.48550/arXiv.1701.04862
[6]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[7]   Label Propagation in Video Sequences [J].
Badrinarayanan, Vijay ;
Galasso, Fabio ;
Cipolla, Roberto .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :3265-3272
[8]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[9]   An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].
Chao, Wei-Lun ;
Changpinyo, Soravit ;
Gong, Boqing ;
Sha, Fei .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68
[10]   DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].
Chen, Chenyi ;
Seff, Ari ;
Kornhauser, Alain ;
Xiao, Jianxiong .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730