Transductive Multi-View Zero-Shot Learning

被引:397
作者
Fu, Yanwei [1 ]
Hospedales, Timothy M. [2 ]
Xiang, Tao [2 ]
Gong, Shaogang [2 ]
机构
[1] Disney Res, Pittsburgh, PA 15213 USA
[2] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
关键词
Transducitve learning; multi-view Learning; transfer Learning; zero-shot Learning; heterogeneous hypergraph; RECOGNITION; OBJECTS;
D O I
10.1109/TPAMI.2015.2408354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
引用
收藏
页码:2332 / 2345
页数:14
相关论文
共 57 条
[21]   A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics [J].
Gong, Yunchao ;
Ke, Qifa ;
Isard, Michael ;
Lazebnik, Svetlana .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) :210-233
[22]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[23]   Multi-view hypergraph learning by patch alignment framework [J].
Hong, Chaoqun ;
Yu, Jun ;
Li, Jonathan ;
Chen, Xuhui .
NEUROCOMPUTING, 2013, 118 :79-86
[24]  
Hospedales T. M., 2011, Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), P251, DOI 10.1109/ICDM.2011.90
[25]   Image Retrieval via Probabilistic Hypergraph Ranking [J].
Huang, Yuchi ;
Liu, Qingshan ;
Zhang, Shaoting ;
Metaxas, Dimitris N. .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :3376-3383
[26]  
Huang YC, 2009, PROC CVPR IEEE, P1738, DOI 10.1109/CVPRW.2009.5206795
[27]  
Hwang S. J., 2011, CVPR, P1761
[28]   Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search [J].
Hwang, Sung Ju ;
Grauman, Kristen .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 100 (02) :134-153
[29]  
Jingen Liu, 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3337, DOI 10.1109/CVPR.2011.5995353
[30]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90