Diverse and tailored image generation for zero-shot multi-label classification

被引:1
|
作者
Zhang, Kaixin [1 ]
Yuan, Zhixiang [1 ]
Huang, Tao [2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, 1530 Maxiang Rd, Maanshan 243032, Anhui, Peoples R China
[2] Univ Sydney, Fac Engn, Sch Comp Sci, J12 Cleveland St, Camperdown, NSW 2008, Australia
关键词
Zero-shot multi-label learning; Deep generative model; Diffusion model; Synthetic data; TRANSFORMER;
D O I
10.1016/j.knosys.2024.112077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, zero -shot multi -label classification has garnered considerable attention owing to its capacity to predict unseen labels without human annotations. Nevertheless, prevailing approaches often use seen classes as imperfect proxies for unseen classes, resulting in suboptimal performance. Drawing inspiration from the success of text -to -image generation models in producing realistic images, we propose an innovative solution: generating synthetic data to construct a training set explicitly tailored for proxyless training of unseen labels. Our approach introduces a novel image generation framework that produces multi -label synthetic images of unseen classes for classifier training. To enhance the diversity of the generated images, we leverage a pretrained large language model to generate diverse prompts. Employing a pretrained multimodal contrastive languageimage pretraining (CLIP) model as a discriminator, we assessed whether the generated images accurately represented the target classes. This enables automatic filtering of inaccurately generated images and preserves classifier accuracy. To refine the text prompts for more precise and effective multi -label object generation, we introduced a CLIP score -based discriminative loss to fine-tune the text encoder in the diffusion model. In addition, to enhance the visual features of the target task while maintaining the generalization of the original features and mitigating catastrophic forgetting resulting from fine-tuning the entire visual encoder, we propose a feature fusion module inspired by transformer attention mechanisms. This module aids in capturing the global dependencies between multiple objects more effectively. Extensive experimental results validated the effectiveness of our approach, demonstrating significant improvements over state-of-the-art methods. The code is available at https://github.com/TAKELAMAG/Diff-ZS-MLC.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs
    Lu, Jueqing
    Du, Lan
    Liu, Ming
    Dipnall, Joanna
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2935 - 2943
  • [32] An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels
    Chalkidis, Ilias
    Fergadiotis, Manos
    Kotitsas, Sotiris
    Malakasiotis, Prodromos
    Aletras, Nikolaos
    Androutsopoulos, Ion
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7503 - 7515
  • [33] An Empirical Analysis for Zero-Shot Multi-Label Classification on COVID-19 CT Scans and Uncurated Reports
    Dack, Ethan
    Brigato, Lorenzo
    McMurray, Matthew
    Fontanellaz, Matthias
    Frauenfelder, Thomas
    Hoppe, Hanno
    Exadaktylos, Aristomenis
    Geiser, Thomas
    Funke-Chambour, Manuela
    Christe, Andreas
    Ebner, Lukas
    Mougiakakou, Stavroula
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2626 - 2635
  • [34] Joint Embedding with Multi-Task Learning for Multi-Label Zero-Shot Action Recognition
    An, Rongqiao
    Miao, Zhenjiang
    Li, Qingyu
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 613 - 618
  • [35] Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learning
    Zhao, Fei
    Tao, Ran
    Wang, Wenhui
    Cui, Bo
    Xu, Yuting
    Ai, Qing
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6285 - 6298
  • [36] Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning
    Ostapuk, Natalia
    Dolamic, Ljiljana
    Mermoud, Alain
    Cudre-Mauroux, Philippe
    2024 11TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS 2024, 2024, : 233 - 236
  • [37] MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
    Chalkidis, Ilias
    Fergadiotis, Manos
    Androutsopoulos, Ion
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6974 - 6996
  • [38] Uncertainty Flow Facilitates Zero-Shot Multi-Label Learning in Affective Facial Analysis
    Bai, Wenjun
    Quan, Changqin
    Luo, Zhiwei
    APPLIED SCIENCES-BASEL, 2018, 8 (02):
  • [39] Label Augmentation for Zero-Shot Hierarchical Text Classification
    Paletto, Lorenzo
    Basile, Valerio
    Esposito, Roberto
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7697 - 7706
  • [40] Gaze Embeddings for Zero-Shot Image Classification
    Karessli, Nour
    Akata, Zeynep
    Schiele, Bernt
    Bulling, Andreas
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6412 - 6421