Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

被引：9

作者：

Hou, Wenjin ^{[1
,4
]}

Chen, Shiming ^{[2
]}

Chen, Shuhuang ^{[1
]}

Hong, Ziming ^{[3
]}

Wang, Yan ^{[4
]}

Feng, Xuetao ^{[4
]}

Khan, Salman ^{[2
,5
]}

Khan, Fahad Shahbaz ^{[2
,6
]}

You, Xinge ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates

[3] Univ Sydney, Sydney, NSW, Australia

[4] Alibaba Grp, Hangzhou, Peoples R China

[5] Australian Natl Univ, Canberra, ACT, Australia

[6] Linkoping Univ, Linkoping, Sweden

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

CLASSIFICATION;

D O I：

10.1109/CVPR52733.2024.02230

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor generalizations (e.g., overfitting to seen classes). To address this issue, we propose a novel Visual-Augmented Dynamic Semantic prototype method (termed VADS) to boost the generator to learn accurate semantic-visual mapping by fully exploiting the visual-augmented knowledge into semantic conditions. In detail, VADS consists of two modules: (1) Visual-aware Domain Knowledge Learning module (VDKL) learns the local bias and global prior of the visual features (referred to as domain visual knowledge), which replace pure Gaussian noise to provide richer prior noise information; (2) Vision-Oriented Semantic Updation module (VOSU) updates the semantic prototype according to the visual representations of the samples. Ultimately, we concatenate their output as a dynamic semantic prototype, which serves as the condition of the generator. Extensive experiments demonstrate that our VADS achieves superior CZSL and GZSL performances on three prominent datasets and outperforms other state-of-the-art methods with averaging increases by 6.4%, 5.9% and 4.2% on SUN, CUB and AWA2, respectively.

引用

页码：23627 / 23637

页数：11

共 54 条

[1]

Alamri F., 2021, ARXIV

[2]

Alamri Faisal, 2021, DAGM GERM C PATT REC, P467

[3]

Cavazza Jacopo, 2023, IEEE T PATTERN ANAL

[4]

Cetin Samet, 2022, THESIS

[5]

Chen Dave Zhenyu, 2022, arXiv

[6] JOINT MULTIPLE INTENT DETECTION AND SLOT FILLING VIA SELF-DISTILLATION [J].

Chen, Lisong ;

Zhou, Peilin ;

Zou, Yuexian .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7612-7616

[7]

Chen S., 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence, P1

[8]

Chen S., 2023, arXiv

[9] FREE: Feature Refinement for Generalized Zero-Shot Learning [J].

Chen, Shiming ;

Wang, Wenjie ;

Xia, Beihao ;

Peng, Qinmu ;

You, Xinge ;

Zheng, Feng ;

Shao, Ling .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :122-131

[10]

Chen SZ, 2021, ADV NEUR IN, V34

← 1 2 3 4 5 6 →