Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

被引:19
作者
Ye, Zihan [1 ]
Hu, Fuyuan [1 ]
Lyu, Fan [2 ]
Li, Linyan [3 ]
Huang, Kaizhu [4 ]
机构
[1] Suzhou Univ Sci & Technol, Suzhou 215009, Peoples R China
[2] Tianjin Univ, Tianjin 300000, Peoples R China
[3] Suzhou Inst Trade & Commerce, Suzhou 215009, Jiangsu, Peoples R China
[4] Xian Jiaotong Liverpool Univ, Dept Elect & Elect Engn, Suzhou 215123, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Semantics; Training; Manganese; Extraterrestrial measurements; Generative adversarial networks; Search problems; Zero-shot learning; generative adversarial network; representation learning; deep learning; CLASSIFICATION;
D O I
10.1109/TMM.2021.3089017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
引用
收藏
页码:2828 / 2840
页数:13
相关论文
共 59 条
  • [1] Label-Embedding for Image Classification
    Akata, Zeynep
    Perronnin, Florent
    Harchaoui, Zaid
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) : 1425 - 1438
  • [2] Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
  • [3] Preserving Semantic Relations for Zero-Shot Learning
    Annadani, Yashas
    Biswas, Soma
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7603 - 7612
  • [4] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [5] Synthesized Classifiers for Zero-Shot Learning
    Changpinyo, Soravit
    Chao, Wei-Lun
    Gong, Boqing
    Sha, Fei
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5327 - 5336
  • [6] An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
    Chao, Wei-Lun
    Changpinyo, Soravit
    Gong, Boqing
    Sha, Fei
    [J]. COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 52 - 68
  • [7] Chen Z., 2021, ARXIV210103292
  • [8] Generative Zero-Shot Learning via Low-Rank Embedded Semantic Dictionary
    Ding, Zhengming
    Shao, Ming
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (12) : 2861 - 2874
  • [9] Farhadi A, 2009, PROC CVPR IEEE, P1778, DOI 10.1109/CVPRW.2009.5206772
  • [10] Multi-modal Cycle-Consistent Generalized Zero-Shot Learning
    Felix, Rafael
    Kumar, B. G. Vijay
    Reid, Ian
    Carneiro, Gustavo
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 21 - 37