Diverse and tailored image generation for zero-shot multi-label classification

被引:1
|
作者
Zhang, Kaixin [1 ]
Yuan, Zhixiang [1 ]
Huang, Tao [2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, 1530 Maxiang Rd, Maanshan 243032, Anhui, Peoples R China
[2] Univ Sydney, Fac Engn, Sch Comp Sci, J12 Cleveland St, Camperdown, NSW 2008, Australia
关键词
Zero-shot multi-label learning; Deep generative model; Diffusion model; Synthetic data; TRANSFORMER;
D O I
10.1016/j.knosys.2024.112077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, zero -shot multi -label classification has garnered considerable attention owing to its capacity to predict unseen labels without human annotations. Nevertheless, prevailing approaches often use seen classes as imperfect proxies for unseen classes, resulting in suboptimal performance. Drawing inspiration from the success of text -to -image generation models in producing realistic images, we propose an innovative solution: generating synthetic data to construct a training set explicitly tailored for proxyless training of unseen labels. Our approach introduces a novel image generation framework that produces multi -label synthetic images of unseen classes for classifier training. To enhance the diversity of the generated images, we leverage a pretrained large language model to generate diverse prompts. Employing a pretrained multimodal contrastive languageimage pretraining (CLIP) model as a discriminator, we assessed whether the generated images accurately represented the target classes. This enables automatic filtering of inaccurately generated images and preserves classifier accuracy. To refine the text prompts for more precise and effective multi -label object generation, we introduced a CLIP score -based discriminative loss to fine-tune the text encoder in the diffusion model. In addition, to enhance the visual features of the target task while maintaining the generalization of the original features and mitigating catastrophic forgetting resulting from fine-tuning the entire visual encoder, we propose a feature fusion module inspired by transformer attention mechanisms. This module aids in capturing the global dependencies between multiple objects more effectively. Extensive experimental results validated the effectiveness of our approach, demonstrating significant improvements over state-of-the-art methods. The code is available at https://github.com/TAKELAMAG/Diff-ZS-MLC.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Deep Ranking for Image Zero-Shot Multi-Label Classification
    Ji, Zhong
    Cui, Biying
    Li, Huihui
    Jiang, Yu-Gang
    Xiang, Tao
    Hospedales, Timothy
    Fu, Yanwei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6549 - 6560
  • [2] MULTI-LABEL ZERO-SHOT AUDIO CLASSIFICATION WITH TEMPORAL ATTENTION
    Dogan, Duygu
    Xie, Huang
    Heittola, Toni
    Virtanen, Tuomas
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 250 - 254
  • [3] Semantic Diversity Learning for Zero-Shot Multi-label Classification
    Ben-Cohen, Avi
    Zamir, Nadav
    Ben Baruch, Emanuel
    Friedman, Itamar
    Zelnik-Manor, Lihi
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 620 - 630
  • [4] MULTI-LABEL AUDIO CLASSIFICATION WITH A NOISY ZERO-SHOT TEACHER
    Braun, Sebastian
    Gamper, Hannes
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 240 - 244
  • [5] Generative Multi-Label Zero-Shot Learning
    Gupta, Akshita
    Narayan, Sanath
    Khan, Salman
    Khan, Fahad Shahbaz
    Shao, Ling
    van de Weijer, Joost
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14611 - 14624
  • [6] Zero-shot multi-label learning via label factorisation
    Shao, Hang
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    IET COMPUTER VISION, 2019, 13 (02) : 117 - 124
  • [7] Pairnorm based Graphical Convolution Network for zero-shot multi-label classification
    Chauhan, Vikas
    Tiwari, Aruna
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [8] Generalized Zero-Shot Extreme Multi-label Learning
    Gupta, Nilesh
    Bohra, Sakina
    Prabhu, Yashoteja
    Purohit, Saurabh
    Varma, Manik
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 527 - 535
  • [9] A Probabilistic Framework for Zero-Shot Multi-Label Learning
    Gaure, Abhilash
    Gupta, Aishwarya
    Verma, Vinay Kumar
    Rai, Piyush
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [10] Zero-Shot Facial Expression Recognition with Multi-label Label Propagation
    Lu, Zijia
    Zeng, Jiabei
    Shan, Shiguang
    Chen, Xilin
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 19 - 34