Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning

被引：37

作者：

Liu, Man ^{[1
,2
]}

Li, Feng ^{[3
]}

Zhang, Chunjie ^{[1
,2
]}

Wei, Yunchao ^{[1
,2
]}

Bai, Huihui ^{[1
,2
]}

Zhao, Yao ^{[1
,2
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing, Peoples R China

[3] Hefei Univ Technol, Hefei, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.01472

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain, relying on the intrinsic interactions between visual and semantic information. Prior works mainly localize regions corresponding to the sharing attributes. When various visual appearances correspond to the same attribute, the sharing attributes inevitably introduce semantic ambiguity, hampering the exploration of accurate semantic-visual interactions. In this paper, we deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between attribute prototypes and visual features, constituting a progressive semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and knowledge transferability improvement. Specifically, DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one. Then, a semantic-motivated instance decoder strengthens accurate cross-domain interactions between the matched pair for semantic-related instance adaption, encouraging the generation of unambiguous visual representations. Moreover, to mitigate the bias towards seen classes in GZSL, a debiasing loss is proposed to pursue response consistency between seen and unseen predictions. The PSVMA consistently yields superior performances against other state-of-the-art methods. Code will be available at: https://github.com/ManLiuCoder/PSVMA.

引用

页码：15337 / 15346

页数：10

共 51 条

[1]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[2] Label-Embedding for Attribute-Based Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826

[3]

Alamri F., 2021, IMVIP

[4]

Alamri Faisal, 2021, DAGM GCPR

[5]

[Anonymous], 2009, Advances in neural information processing systems

[6]

[Anonymous], 2012, PROC CVPR IEEE

[7]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00961

[8] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions [J].

Ba, Jimmy Lei ;

Swersky, Kevin ;

Fidler, Sanja ;

Salakhutdinov, Ruslan .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4247-4255

[9] An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].

Chao, Wei-Lun ;

Changpinyo, Soravit ;

Gong, Boqing ;

Sha, Fei .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68

[10]

Chen S., 2022, CVPR

← 1 2 3 4 5 6 →