Transductive Visual-Semantic Embedding for Zero-shot Learning

被引：8

作者：

Xu, Xing ^{[1
,2
]}

Shen, Fumin ^{[1
,2
]}

Yang, Yang ^{[1
,2
]}

Shao, Jie ^{[1
,2
]}

Huang, Zi ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia

来源：

PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17) | 2017年

基金：

中国国家自然科学基金;

关键词：

Zero-shot learning; transductive learning; matrix factorization; manifold learning;

D O I：

10.1145/3078971.3078977

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances. However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function. Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes. To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL. In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting. In the learned space, each instance is viewed as a mixture of seen class scores. TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes. Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.

引用

页码：41 / 49

页数：9

共 36 条

[1] Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[2] Label-Embedding for Attribute-Based Classification
Akata, Zeynep
Perronnin, Florent
Harchaoui, Zaid
Schmid, Cordelia
[J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 819 - 826
[3] [Anonymous], 2013, Advances in neural information processing systems
[4] [Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.17
[5] [Anonymous], 2015, CVPR
[6] Boyd S, 2004, CONVEX OPTIMIZATION
[7] CAI D, 2009, LOCALITY PRESERVING, P1010
[8] Synthesized Classifiers for Zero-Shot Learning
Changpinyo, Soravit
Chao, Wei-Lun
Gong, Boqing
Sha, Fei
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5327 - 5336
[9] Farhadi A, 2009, PROC CVPR IEEE, P1778, DOI 10.1109/CVPRW.2009.5206772
[10] Frome A., 2013, ADV NEURAL INFORM PR, P2121, DOI DOI 10.5555/2999792.2999849

← 1 2 3 4 →