Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

被引：21

作者：

Ye, Zihan ^{[1
]}

Hu, Fuyuan ^{[1
]}

Lyu, Fan ^{[2
]}

Li, Linyan ^{[3
]}

Huang, Kaizhu ^{[4
]}

机构：

[1] Suzhou Univ Sci & Technol, Suzhou 215009, Peoples R China

[2] Tianjin Univ, Tianjin 300000, Peoples R China

[3] Suzhou Inst Trade & Commerce, Suzhou 215009, Jiangsu, Peoples R China

[4] Xian Jiaotong Liverpool Univ, Dept Elect & Elect Engn, Suzhou 215123, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Semantics; Training; Manganese; Extraterrestrial measurements; Generative adversarial networks; Search problems; Zero-shot learning; generative adversarial network; representation learning; deep learning; CLASSIFICATION;

D O I：

10.1109/TMM.2021.3089017

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.

引用

页码：2828 / 2840

页数：13

共 59 条

[1] Label-Embedding for Image Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438

[2]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[3] Preserving Semantic Relations for Zero-Shot Learning [J].

Annadani, Yashas ;

Biswas, Soma .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7603-7612

[4]

Arjovsky M, 2017, PR MACH LEARN RES, V70

[5] Synthesized Classifiers for Zero-Shot Learning [J].

Changpinyo, Soravit ;

Chao, Wei-Lun ;

Gong, Boqing ;

Sha, Fei .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5327-5336

[6] An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].

Chao, Wei-Lun ;

Changpinyo, Soravit ;

Gong, Boqing ;

Sha, Fei .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68

[7]

Chen Z., 2021, ARXIV210103292

[8] Generative Zero-Shot Learning via Low-Rank Embedded Semantic Dictionary [J].

Ding, Zhengming ;

Shao, Ming ;

Fu, Yun .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (12) :2861-2874

[9]

Farhadi A, 2009, PROC CVPR IEEE, P1778, DOI 10.1109/CVPRW.2009.5206772

[10] Multi-modal Cycle-Consistent Generalized Zero-Shot Learning [J].

Felix, Rafael ;

Kumar, B. G. Vijay ;

Reid, Ian ;

Carneiro, Gustavo .

COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :21-37

← 1 2 3 4 5 6 →