Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

被引:77
作者
Long, Yang [1 ]
Liu, Li [2 ,3 ]
Shen, Fumin [4 ]
Shao, Ling [2 ,3 ]
Li, Xuelong [5 ]
机构
[1] Univ Newcastle, Sch Comp Sci, OpenLab, Newcastle Upon Tyne NE4 5TG, Tyne & Wear, England
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Univ East Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
[4] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[5] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Xian 710119, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot learning; data synthesis; diffusion regularisation; visual-semantic embedding; object recognition; RECOGNITION;
D O I
10.1109/TPAMI.2017.2762295
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sufficient training examples are the fundamental requirement for most of the learning tasks. However, collecting well-labelled training examples is costly. Inspired by Zero-shot Learning (ZSL) that can make use of visual attributes or natural language semantics as an intermediate level clue to associate low-level features with high-level classes, in a novel extension of this idea, we aim to synthesise training data for novel classes using only semantic attributes. Despite the simplicity of this idea, there are several challenges. First, how to prevent the synthesised data from over-fitting to training classes? Second, how to guarantee the synthesised data is discriminative for ZSL tasks? Third, we observe that only a few dimensions of the learnt features gain high variances whereas most of the remaining dimensions are not informative. Thus, the question is how to make the concentrated information diffuse to most of the dimensions of synthesised data. To address the above issues, we propose a novel embedding algorithm named Unseen Visual Data Synthesis (UVDS) that projects semantic features to the high-dimensional visual feature space. Two main techniques are introduced in our proposed algorithm. (1) We introduce a latent embedding space which aims to reconcile the structural difference between the visual and semantic spaces, meanwhile preserve the local structure. (2) We propose a novel Diffusion Regularisation (DR) that explicitly forces the variances to diffuse over most dimensions of the synthesised data. By an orthogonal rotation (more precisely, an orthogonal transformation), DR can remove the redundant correlated attributes and further alleviate the over-fitting problem. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data for zero-shot learning. Extensive experimental results suggest that our proposed approach significantly outperforms the state-of-the-art methods.
引用
收藏
页码:2498 / 2512
页数:15
相关论文
共 63 条
[1]   Multi-Cue Zero-Shot Learning with Strong Supervision [J].
Akata, Zeynep ;
Malinowski, Mateusz ;
Fritz, Mario ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :59-68
[2]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[3]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[4]   How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes [J].
Al-Halah, Ziad ;
Stiefelhagen, Rainer .
2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, :837-843
[5]  
[Anonymous], 2012, P INT C NEUR INF PRO
[6]  
[Anonymous], 2013, Advances in neural information processing systems
[7]  
[Anonymous], P 28 INT C NEUR INF
[8]  
[Anonymous], 2014, 14091556 ARXIV
[9]  
[Anonymous], P 29 INT C MACH LEAR
[10]  
[Anonymous], 2013, NeurIPS