Leveraging Self-Distillation and Disentanglement Network to Enhance Visual-Semantic Feature Consistency in Generalized Zero-Shot Learning

被引:0
作者
Liu, Xiaoming [1 ,2 ,3 ]
Wang, Chen [1 ,2 ]
Yang, Guan [1 ,2 ]
Wang, Chunhua [4 ]
Long, Yang [5 ]
Liu, Jie [3 ,6 ]
Zhang, Zhiyuan [1 ,2 ]
机构
[1] Zhongyuan Univ Technol, Sch Comp Sci, Zhengzhou 450007, Peoples R China
[2] Zhengzhou Key Lab Text Proc & Image Understanding, Zhengzhou 450007, Peoples R China
[3] Res Ctr Language Intelligence China, Beijing 100089, Peoples R China
[4] Huanghuai Univ, Sch Animat Acad, Zhumadian 463000, Peoples R China
[5] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
[6] North China Univ Technol, Sch Informat Sci, Beijing 100144, Peoples R China
基金
中国国家自然科学基金;
关键词
generalized zero-shot learning; self-distillation; disentanglement network; visual-semantic feature consistency;
D O I
10.3390/electronics13101977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual-semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual-semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual-semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual-semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual-semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual-semantic features. Finally, the consistent visual-semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN).
引用
收藏
页数:18
相关论文
共 46 条
  • [31] Learning Discriminative Projection With Visual Semantic Alignment for Generalized Zero Shot Learning
    Du, Pengzhen
    Zhang, Haofeng
    Lu, Jianfeng
    IEEE ACCESS, 2020, 8 (08): : 166273 - 166282
  • [32] Generative-based hybrid model with semantic representations for generalized zero-shot learning
    Akdemir, Emre
    Barisci, Necaattin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [33] Generating generalized zero-shot learning based on dual-path feature enhancement
    Chang, Xinyi
    Wang, Zhen
    Liu, Wenhao
    Gao, Limeng
    Yan, Bingshuai
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [34] A De-redundant Network with Enhanced Classifier for Generalized Zero-Shot Learning
    Ding, Jiayu
    Hu, Xiao
    Xiang, Junjiang
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 253 - 258
  • [35] Adaptive Margin-based Contrastive Network for Generalized Zero-Shot Learning
    Lee, Jeong-Cheol
    Shibu, Athul
    Lee, Dong-Gyu
    2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
  • [36] Audio-Visual Generalized Zero-Shot Learning Based on Variational Information Bottleneck
    Li, Yapeng
    Luo, Yong
    Du, Bo
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 450 - 455
  • [37] Improving generalized zero-shot learning via cluster-based semantic disentangling representation
    Gao, Yi
    Feng, Wentao
    Xiao, Rong
    He, Lihuo
    He, Zhenan
    Lv, Jiancheng
    Tang, Chenwei
    PATTERN RECOGNITION, 2024, 150
  • [38] GENERALIZED ZERO-SHOT LEARNING USING MULTIMODAL VARIATIONAL AUTO-ENCODER WITH SEMANTIC CONCEPTS
    Bendre, Nihar
    Desai, Kevin
    Najafirad, Peyman
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1284 - 1288
  • [39] Guided CNN for generalized zero-shot and open-set recognition using visual and semantic prototypes
    Geng, Chuanxing
    Tao, Lue
    Chen, Songcan
    PATTERN RECOGNITION, 2020, 102
  • [40] Multi-domain feature-enhanced attribute updater for generalized zero-shot learning
    Yuyan Shi
    Chenyi Jiang
    Feifan Song
    Qiaolin Ye
    Yang Long
    Haofeng Zhang
    Neural Computing and Applications, 2025, 37 (14) : 8397 - 8414