Zero-Shot Visual Recognition via Bidirectional Latent Embedding

被引:0
作者
Qian Wang
Ke Chen
机构
[1] The University of Manchester,School of Computer Science
来源
International Journal of Computer Vision | 2017年 / 124卷
关键词
Zero-shot learning; Object recognition; Human action recognition; Supervised locality preserving projection; Landmark-based Sammon mapping; Multiple visual and semantic representations;
D O I
暂无
中图分类号
学科分类号
摘要
Zero-shot learning for visual recognition, e.g., object and action recognition, has recently attracted a lot of attention. However, it still remains challenging in bridging the semantic gap between visual features and their underlying semantics and transferring knowledge to semantic categories unseen during learning. Unlike most of the existing zero-shot visual recognition methods, we propose a stagewise bidirectional latent embedding framework of two subsequent learning stages for zero-shot visual recognition. In the bottom–up stage, a latent embedding space is first created by exploring the topological and labeling information underlying training data of known classes via a proper supervised subspace learning algorithm and the latent embedding of training data are used to form landmarks that guide embedding semantics underlying unseen classes into this learned latent space. In the top–down stage, semantic representations of unseen-class labels in a given label vocabulary are then embedded to the same latent space to preserve the semantic relatedness between all different classes via our proposed semi-supervised Sammon mapping with the guidance of landmarks. Thus, the resultant latent embedding space allows for predicting the label of a test instance with a simple nearest-neighbor rule. To evaluate the effectiveness of the proposed framework, we have conducted extensive experiments on four benchmark datasets in object and action recognition, i.e., AwA, CUB-200-2011, UCF101 and HMDB51. The experimental results under comparative studies demonstrate that our proposed approach yields the state-of-the-art performance under inductive and transductive settings.
引用
收藏
页码:356 / 383
页数:27
相关论文
共 69 条
[1]  
Akata Z(2016)Label-embedding for image classification IEEE Transactions on Pattern Analysis and Machine Intelligence 38 1425-1438
[2]  
Perronnin F(2013)50 years of object recognition: Directions forward Computer Vision and Image Understanding 117 827-891
[3]  
Harchaoui Z(2005)Supervised kernel locality preserving projections for face recognition Neurocomputing 67 443-449
[4]  
Schmid C(2015)Transductive multi-view zero-shot learning IEEE Transactions on Pattern Analysis and Machine Intelligence 37 2332-2345
[5]  
Andreopoulos A(2010)Manifold and subspace learning for pattern recognition Pattern Recognition and Machine Vision 6 215-233
[6]  
Tsotsos JK(2014)A multi-view embedding space for modeling internet images, tags, and their semantics International Journal of Computer Vision 106 210-2664
[7]  
Cheng J(2004)Canonical correlation analysis: An overview with application to learning methods Neural Computation 16 2639-465
[8]  
Liu Q(2014)Attribute-based classification for zero-shot visual object categorization IEEE Transactions on Pattern Analysis and Machine Intelligence 36 453-125
[9]  
Lu H(2016)Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice Computer Vision and Image Understanding 150 109-2531
[10]  
Chen Y-W(2010)Hubs in space: Popular nearest neighbors in high-dimensional data The Journal of Machine Learning Research 11 2487-252