Transductive Visual-Semantic Embedding for Zero-shot Learning

被引:8
作者
Xu, Xing [1 ,2 ]
Shen, Fumin [1 ,2 ]
Yang, Yang [1 ,2 ]
Shao, Jie [1 ,2 ]
Huang, Zi [3 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
来源
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17) | 2017年
基金
中国国家自然科学基金;
关键词
Zero-shot learning; transductive learning; matrix factorization; manifold learning;
D O I
10.1145/3078971.3078977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances. However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function. Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes. To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL. In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting. In the learned space, each instance is viewed as a mixture of seen class scores. TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes. Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.
引用
收藏
页码:41 / 49
页数:9
相关论文
共 36 条
  • [1] Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
  • [2] Label-Embedding for Attribute-Based Classification
    Akata, Zeynep
    Perronnin, Florent
    Harchaoui, Zaid
    Schmid, Cordelia
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 819 - 826
  • [3] [Anonymous], 2013, Advances in neural information processing systems
  • [4] [Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.17
  • [5] [Anonymous], 2015, CVPR
  • [6] Boyd S, 2004, CONVEX OPTIMIZATION
  • [7] CAI D, 2009, LOCALITY PRESERVING, P1010
  • [8] Synthesized Classifiers for Zero-Shot Learning
    Changpinyo, Soravit
    Chao, Wei-Lun
    Gong, Boqing
    Sha, Fei
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5327 - 5336
  • [9] Farhadi A, 2009, PROC CVPR IEEE, P1778, DOI 10.1109/CVPRW.2009.5206772
  • [10] Frome A., 2013, ADV NEURAL INFORM PR, P2121, DOI DOI 10.5555/2999792.2999849