Domain-aware multi-modality fusion network for generalized zero-shot learning

被引：9

作者：

Wang, Jia ^{[1
,3
]}

Wang, Xiao ^{[1
,3
]}

Zhang, Han ^{[2
]}

机构：

[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China

[2] Nankai Univ, Coll Artificial Intelligence, Tianjin 300350, Peoples R China

[3] Nankai Univ, Tianjin Key Lab Network & Data Secur Technol, Tianjin 300350, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 488卷

关键词：

Domain Detection; Multi-modality Fusion; Graph Convolutional Network; Generalized Zero-shot Learning;

D O I：

10.1016/j.neucom.2022.02.056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generalized zero-shot learning (GZSL) is a challenging problem which aims to recognize images from both seen and unseen classes. Existing research suffers from the bias problem, which means that the model tends to misclassify an unseen sample to seen classes. Moreover, recent methods mainly focus on using a single semantic representation for knowledge transfer (e.g., attributes). Although some try to utilize multiple information, they only use simple concatenation or transformations and the perfor-mance is limited. To solve GZSL problem, we propose a two-step method aimed at overcoming these two challenges progressively. Firstly, a local neighborhood based gating model is designed to leverage both the distribution of original data space and a learned latent space for domain detection. The model is used to separate seen and unseen samples, and then decompose GZSL into a conventional zero-shot learning (ZSL) problem and a supervised classification problem. Then, we design a graph convolutional network (GCN) based model for fusing multiple semantic modalities to promote the solution of the decomposed ZSL problem. By using one primary modality as input and another for construction of node relationships, our model is able to fuse multiple information effectively and helps to learn more discrim-inative visual classifiers. We test our method, local neighborhood based domain aware and GCN based multi-modality fusion network (LND-GMF) on five benchmark datasets. The results show that our method out-performs state-of-the-art methods with a large margin.(c) 2022 Elsevier B.V. All rights reserved.

引用

页码：23 / 35

页数：13

共 62 条

[1]

Abu-El-Haija S, 2019, PR MACH LEARN RES, V115, P841

[2]

Aha D.W., 2013, Lazy Learning

[3]

Akata Z, 2014, ARXIV PREPRINT ARXIV

[4]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[5] Label-Embedding for Attribute-Based Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826

[6] Adaptive Confidence Smoothing for Generalized Zero-Shot Learning [J].

Atzmon, Yuval ;

Chechik, Gal .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11663-11672

[7] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions [J].

Ba, Jimmy Lei ;

Swersky, Kevin ;

Fidler, Sanja ;

Salakhutdinov, Ruslan .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4247-4255

[8] Synthesized Classifiers for Zero-Shot Learning [J].

Changpinyo, Soravit ;

Chao, Wei-Lun ;

Gong, Boqing ;

Sha, Fei .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5327-5336

[9] An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].

Chao, Wei-Lun ;

Changpinyo, Soravit ;

Gong, Boqing ;

Sha, Fei .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68

[10] Multi-Label Image Recognition with Graph Convolutional Networks [J].

Chen, Zhao-Min ;

Wei, Xiu-Shen ;

Wang, Peng ;

Guo, Yanwen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181

← 1 2 3 4 5 6 7 →