Towards Discriminative Feature Generation for Generalized Zero-Shot Learning

被引：1

作者：

Ge, Jiannan ^{[1
]}

Xie, Hongtao ^{[1
]}

Li, Pandeng ^{[1
]}

Xie, Lingxi ^{[2
]}

Min, Shaobo ^{[3
]}

Zhang, Yongdong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Brain inspired Intelligence Technol, Hefei 230026, Peoples R China

[2] Huawei Cloud, Shenzhen 518100, Peoples R China

[3] Tencent, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Semantics; Training; Visualization; Feature extraction; Zero-shot learning; Noise; Generators; recognition; multi-modality embedding; LOCALIZATION;

D O I：

10.1109/TMM.2024.3408048

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Generalized Zero-Shot Learning (GZSL) aims to recognize both seen and unseen categories by establishing visual and semantic relations. Recently, generation-based methods that focus on synthesizing fictitious visual features from corresponding attributes have gained significant attention. However, these generated features often lack discriminative capabilities due to inadequate training of the generative model. To address this issue, we propose a novel Discriminative Enhanced Network (DENet) to harness the potential of the generative model by adapting the training features and imposing constraints on the generated features. Our approach incorporates three pivotal modules. 1) Before the generative network training, we implement a Pre-Tuning Module (PTM) to eliminate irrelevant background noise in the raw features extracted from a fixed CNN backbone. Therefore, PTM can provide tuned training features without redundant noise for generative model. 2) During the generative network training, we propose an Asymmetry Cross-authenticity Contrastive (AC2) loss to group visual features of the same category while repel features from different categories by optimizing a large number of sample pairs. Additionally, we incorporate intra-class and relation-specific inter-class boundaries within the AC2 loss to enrich sample diversity and preserve valid semantic information. 3) Also within the generative network training, a Dual-semantic Alignment Module (DAM) is designed to align visual features with both attributes and label embeddings, enabling the model to learn attribute-related information and discriminative extended semantics. Experiments on four standard benchmarks demonstrate that our approach learns more discriminative features and surpasses the existing methods.

引用

页码：10514 / 10529

页数：16

共 97 条

[1] Label-Embedding for Image Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438

[2]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[3] Unpaired Image Captioning With semantic-Constrained Self-Learning [J].

Ben, Huixia ;

Pan, Yingwei ;

Li, Yehao ;

Yao, Ting ;

Hong, Richang ;

Wang, Meng ;

Mei, Tao .

IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :904-916

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J].

Chang, Dongliang ;

Ding, Yifeng ;

Xie, Jiyang ;

Bhunia, Ayan Kumar ;

Li, Xiaoxu ;

Ma, Zhanyu ;

Wu, Ming ;

Guo, Jun ;

Song, Yi-Zhe .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4683-4695

[6] Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [J].

Chen, Shiming ;

Hou, Wenjin ;

Khan, Salman ;

Khan, Fahad Shahbaz .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :23964-23974

[7]

Chen SM, 2022, AAAI CONF ARTIF INTE, P330

[8] FREE: Feature Refinement for Generalized Zero-Shot Learning [J].

Chen, Shiming ;

Wang, Wenjie ;

Xia, Beihao ;

Peng, Qinmu ;

You, Xinge ;

Zheng, Feng ;

Shao, Ling .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :122-131

[9] Explanatory Object Part Aggregation for Zero-Shot Learning [J].

Chen, Xin ;

Deng, Xiaoling ;

Lan, Yubin ;

Long, Yongbing ;

Weng, Jian ;

Liu, Zhiquan ;

Tian, Qi .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) :851-868

[10] Semantics Disentangling for Generalized Zero-Shot Learning [J].

Chen, Zhi ;

Luo, Yadan ;

Qiu, Ruihong ;

Wang, Sen ;

Huang, Zi ;

Li, Jingjing ;

Zhang, Zheng .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :8692-8700

← 1 2 3 4 5 6 7 8 9 10 →