Diverse and tailored image generation for zero-shot multi-label classification

被引:1
|
作者
Zhang, Kaixin [1 ]
Yuan, Zhixiang [1 ]
Huang, Tao [2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, 1530 Maxiang Rd, Maanshan 243032, Anhui, Peoples R China
[2] Univ Sydney, Fac Engn, Sch Comp Sci, J12 Cleveland St, Camperdown, NSW 2008, Australia
关键词
Zero-shot multi-label learning; Deep generative model; Diffusion model; Synthetic data; TRANSFORMER;
D O I
10.1016/j.knosys.2024.112077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, zero -shot multi -label classification has garnered considerable attention owing to its capacity to predict unseen labels without human annotations. Nevertheless, prevailing approaches often use seen classes as imperfect proxies for unseen classes, resulting in suboptimal performance. Drawing inspiration from the success of text -to -image generation models in producing realistic images, we propose an innovative solution: generating synthetic data to construct a training set explicitly tailored for proxyless training of unseen labels. Our approach introduces a novel image generation framework that produces multi -label synthetic images of unseen classes for classifier training. To enhance the diversity of the generated images, we leverage a pretrained large language model to generate diverse prompts. Employing a pretrained multimodal contrastive languageimage pretraining (CLIP) model as a discriminator, we assessed whether the generated images accurately represented the target classes. This enables automatic filtering of inaccurately generated images and preserves classifier accuracy. To refine the text prompts for more precise and effective multi -label object generation, we introduced a CLIP score -based discriminative loss to fine-tune the text encoder in the diffusion model. In addition, to enhance the visual features of the target task while maintaining the generalization of the original features and mitigating catastrophic forgetting resulting from fine-tuning the entire visual encoder, we propose a feature fusion module inspired by transformer attention mechanisms. This module aids in capturing the global dependencies between multiple objects more effectively. Extensive experimental results validated the effectiveness of our approach, demonstrating significant improvements over state-of-the-art methods. The code is available at https://github.com/TAKELAMAG/Diff-ZS-MLC.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Neural Architecture Search with Heterogeneous Representation Learning for Zero-Shot Multi-Label Text Classification
    Chen, Liang
    Yan, Xueming
    Wang, Zilong
    Huang, Han
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning
    Liu, Hui
    Zhang, Danqing
    Yin, Bing
    Zhu, Xiaodan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1051 - 1062
  • [23] A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning
    Huynh, Dat
    Elhamifar, Ehsan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 8773 - 8783
  • [24] Discriminative Region-based Multi-Label Zero-Shot Learning
    Narayan, Sanath
    Gupta, Akshita
    Khan, Salman
    Khan, Fahad Shahbaz
    Shao, Ling
    Shah, Mubarak
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8711 - 8720
  • [25] Zero-Shot Multi-Label Topic Inference with Sentence Encoders & LLMs
    Sarkar, Souvika
    Feng, Dongji
    Santu, Shubhra Kanti Karmaker
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16218 - 16233
  • [26] Multi-Label Zero-Shot Learning Via Contrastive Label-Based Attention
    Meng, Shixuan
    Jiang, Rongxin
    Tian, Xiang
    Zhou, Fan
    Chen, Yaowu
    Liu, Junjie
    Shen, Chen
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2025, 35 (03)
  • [27] MULTI-LABEL ZERO-SHOT LEARNING WITH TRANSFER-AWARE LABEL EMBEDDING PROJECTION
    Ye, Meng
    Guo, Yuhong
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3671 - 3675
  • [28] Multi-label Generalized Zero-Shot Learning Using Identifiable Variational Autoencoders
    Gull, Muqaddas
    Arif, Omar
    EXTENDED REALITY, XR SALENTO 2023, PT II, 2023, 14219 : 35 - 50
  • [29] Towards Unbiased Multi-Label Zero-Shot Learning With Pyramid and Semantic Attention
    Liu, Ziming
    Guo, Song
    Guo, Jingcai
    Xu, Yuanyuan
    Huo, Fushuo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7441 - 7455
  • [30] Rolling bearing fault diagnosis based on multi-label zero-shot learning
    Zhang Y.
    Shao F.
    Zhao X.
    Wang L.
    Lü K.
    Zhang Z.
    Zhendong yu Chongji/Journal of Vibration and Shock, 2022, 41 (11): : 55 - 64and89