Leveraging Self-Distillation and Disentanglement Network to Enhance Visual-Semantic Feature Consistency in Generalized Zero-Shot Learning

被引：0

作者：

Liu, Xiaoming ^{[1
,2
,3
]}

Wang, Chen ^{[1
,2
]}

Yang, Guan ^{[1
,2
]}

Wang, Chunhua ^{[4
]}

Long, Yang ^{[5
]}

Liu, Jie ^{[3
,6
]}

Zhang, Zhiyuan ^{[1
,2
]}

机构：

[1] Zhongyuan Univ Technol, Sch Comp Sci, Zhengzhou 450007, Peoples R China

[2] Zhengzhou Key Lab Text Proc & Image Understanding, Zhengzhou 450007, Peoples R China

[3] Res Ctr Language Intelligence China, Beijing 100089, Peoples R China

[4] Huanghuai Univ, Sch Animat Acad, Zhumadian 463000, Peoples R China

[5] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England

[6] North China Univ Technol, Sch Informat Sci, Beijing 100144, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 10期

基金：

中国国家自然科学基金;

关键词：

generalized zero-shot learning; self-distillation; disentanglement network; visual-semantic feature consistency;

D O I：

10.3390/electronics13101977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual-semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual-semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual-semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual-semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual-semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual-semantic features. Finally, the consistent visual-semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN).

引用

页数：18

共 46 条

[31] Learning Discriminative Projection With Visual Semantic Alignment for Generalized Zero Shot Learning
Du, Pengzhen
Zhang, Haofeng
Lu, Jianfeng
IEEE ACCESS, 2020, 8 (08): : 166273 - 166282
[32] Generative-based hybrid model with semantic representations for generalized zero-shot learning
Akdemir, Emre
Barisci, Necaattin
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[33] Generating generalized zero-shot learning based on dual-path feature enhancement
Chang, Xinyi
Wang, Zhen
Liu, Wenhao
Gao, Limeng
Yan, Bingshuai
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[34] A De-redundant Network with Enhanced Classifier for Generalized Zero-Shot Learning
Ding, Jiayu
Hu, Xiao
Xiang, Junjiang
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 253 - 258
[35] Adaptive Margin-based Contrastive Network for Generalized Zero-Shot Learning
Lee, Jeong-Cheol
Shibu, Athul
Lee, Dong-Gyu
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
[36] Audio-Visual Generalized Zero-Shot Learning Based on Variational Information Bottleneck
Li, Yapeng
Luo, Yong
Du, Bo
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 450 - 455
[37] Improving generalized zero-shot learning via cluster-based semantic disentangling representation
Gao, Yi
Feng, Wentao
Xiao, Rong
He, Lihuo
He, Zhenan
Lv, Jiancheng
Tang, Chenwei
PATTERN RECOGNITION, 2024, 150
[38] GENERALIZED ZERO-SHOT LEARNING USING MULTIMODAL VARIATIONAL AUTO-ENCODER WITH SEMANTIC CONCEPTS
Bendre, Nihar
Desai, Kevin
Najafirad, Peyman
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1284 - 1288
[39] Guided CNN for generalized zero-shot and open-set recognition using visual and semantic prototypes
Geng, Chuanxing
Tao, Lue
Chen, Songcan
PATTERN RECOGNITION, 2020, 102
[40] Multi-domain feature-enhanced attribute updater for generalized zero-shot learning
Yuyan Shi
Chenyi Jiang
Feifan Song
Qiaolin Ye
Yang Long
Haofeng Zhang
Neural Computing and Applications, 2025, 37 (14) : 8397 - 8414

← 1 2 3 4 5 →