Adaptive Conditional Denoising Diffusion Model With Hybrid Affinity Regularizer for Generalized Zero-Shot Learning

被引：5

作者：

Gao, Mengyu ^{[1
,2
,3
]}

Dong, Qiulei ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Visualization; Noise reduction; Adaptation models; Semantics; Task analysis; Circuits and systems; Zero-shot learning; Denoising diffusion model; adaptive feature synthesis; generalized zero-shot learning; NETWORK;

D O I：

10.1109/TCSVT.2024.3359238

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generalized zero-shot learning (GZSL) is a challenging topic in both computer vision and machine learning. Recently, generative models (e.g., GAN and VAE) have attracted much attention for handling the GZSL task, however, they are sometimes prone to either model collapse or ambiguous distribution modeling. Inspired by the feature generation ability of denoising diffusion models in other visual tasks, we propose an Adaptive Conditional Denoising Diffusion Model to synthesize unseen-class visual features for GZSL on condition of a set of semantic features in this paper, called AC-DDM. Unlike traditional denoising diffusion models whose reverse process has both a fixed time interval and a fixed number of total denoising time steps, the proposed AC-DDM has a learnable distribution-constrained predictor which could adaptively learn the time interval and the number of total denoising time steps for each unseen class, so that it could synthesize more discriminative features for sample classification. In order to improve the discrimination ability of the synthesized visual features further, we also explore a hybrid affinity regularizer under the proposed AC-DDM, which forces the differences among the affinity matrices of the real and synthesized visual features to be small. Extensive experimental results on four public benchmark datasets demonstrate the superiority of the proposed model over 20 state-of-the-art models in both the ZSL and GZSL tasks.

引用

页码：5641 / 5652

页数：12

共 55 条

[1] Label-Embedding for Image Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438

[2]

Ardizzone L, 2019, Arxiv, DOI arXiv:1808.04730

[3]

Arjovsky M, 2017, Arxiv, DOI [arXiv:1701.07875, 10.48550/arXiv.1701.07875]

[4] Hardness Sampling for Self-Training Based Transductive Zero-Shot Learning [J].

Bo, Liu ;

Dong, Qiulei ;

Hu, Zhanyi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16494-16503

[5] No Adversaries to Zero-Shot Learning: Distilling an Ensemble of Gaussian Feature Generators [J].

Cavazza, Jacopo ;

Murino, Vittorio ;

Bue, Alessio Del .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) :12167-12178

[6] TransZero plus plus : Cross Attribute-Guided Transformer for Zero-Shot Learning [J].

Chen, Shiming ;

Hong, Ziming ;

Hou, Wenjin ;

Xie, Guo-Sen ;

Song, Yibing ;

Zhao, Jian ;

You, Xinge ;

Yan, Shuicheng ;

Shao, Ling .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) :12844-12861

[7]

Chen SM, 2021, ADV NEUR IN, V34

[8] MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning [J].

Chen, Shiming ;

Hong, Ziming ;

Xie, Guo-Sen ;

Yang, Wenhan ;

Peng, Qinmu ;

Wang, Kai ;

Zhao, Jian ;

You, Xinge .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :7602-7611

[9] FREE: Feature Refinement for Generalized Zero-Shot Learning [J].

Chen, Shiming ;

Wang, Wenjie ;

Xia, Beihao ;

Peng, Qinmu ;

You, Xinge ;

Zheng, Feng ;

Shao, Ling .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :122-131

[10] GNDAN: Graph Navigated Dual Attention Network for Zero-Shot Learning [J].

Chen, Shiming ;

Hong, Ziming ;

Xie, Guosen ;

Peng, Qinmu ;

You, Xinge ;

Ding, Weiping ;

Shao, Ling .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) :4516-4529

← 1 2 3 4 5 6 →