Multi-Label Zero-Shot Learning With Adversarial and Variational Techniques

被引:0
作者
Gull, Muqaddas [1 ]
Arif, Omar [1 ,2 ]
机构
[1] Natl Univ Sci & Technol NUST, Sch Elect Engn & Comp Sci, Islamabad 44000, Pakistan
[2] Amer Univ Sharjah, Dept Comp Sci & Engn, Sharjah, U Arab Emirates
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantics; Visualization; Generative adversarial networks; Task analysis; Data models; Zero-shot learning; Training; Encoders; Conditional variational autoencoder; conditional generative adversarial network; generalized zero-shot learning; regressor; zero-shot learning;
D O I
10.1109/ACCESS.2024.3425547
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-label zero-shot learning expands upon the traditional single-label zero-shot learning paradigm by addressing the challenge of accurately classifying images containing multiple unseen classes, which are not part of the training data. Current techniques rely on attention mechanisms to tackle the complexities of multi-label zero-shot learning (ZSL) and generalized zero-shot learning (GZSL). However, the generation of features, especially within the context of a generative approach, remains an unexplored area. In this paper, we propose a generative approach that leverages the capabilities of Conditional Variational Autoencoder (CVAE) and Conditional Generative Adversarial Network (CGAN) to enhance the quality of generative data for both multi-label ZSL and GZSL. Additionally, we introduce a novel "Regressor" as a supplementary tool to improve the reconstruction of visual features. This Regressor operates in conjunction with a "cycle-consistency loss" to ensure that the generated features preserve the key qualities of the original features even after undergoing transformations. To gauge the efficacy of our proposed approach, we conducted comprehensive experiments on two widely recognized benchmark datasets: NUS-WIDE and MS COCO. Our evaluation spanned both multi-label ZSL and GZSL scenarios. Notably, our approach yielded significant enhancements in mean Average Precision (mAP) for both datasets. Specifically, we observed a 0.2% increase in performance on the NUS-WIDE dataset and a notable 2.6% improvement on the MS COCO dataset in the context of Multi-label ZSL. The results clearly demonstrate that our generative approach outperforms existing methods on these widely-recognized datasets.
引用
收藏
页码:94990 / 95006
页数:17
相关论文
共 88 条
[1]   Label-Embedding for Image Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438
[2]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[3]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[4]   Preserving Semantic Relations for Zero-Shot Learning [J].
Annadani, Yashas ;
Biswas, Soma .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7603-7612
[5]   Semantic Diversity Learning for Zero-Shot Multi-label Classification [J].
Ben-Cohen, Avi ;
Zamir, Nadav ;
Ben Baruch, Emanuel ;
Friedman, Itamar ;
Zelnik-Manor, Lihi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :620-630
[6]   Generating Visual Representations for Zero-Shot Classification [J].
Bucher, Maxime ;
Herbin, Stephane ;
Jurie, Frederic .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2666-2673
[7]  
Chen M.-C., 2022, P IEEE INT C MULT EX, P1
[8]   Disentangling, Embedding and Ranking Label Cues for Multi-Label Image Recognition [J].
Chen, Zhao-Min ;
Cui, Quan ;
Wei, Xiu-Shen ;
Jin, Xin ;
Guo, Yanwen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :1827-1840
[9]   Multi-Label Image Recognition with Graph Convolutional Networks [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Wang, Peng ;
Guo, Yanwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181
[10]  
Chua J., 2009, P ACM INT C IM VID R, P1