Dual-focus transfer network for zero-shot learning

被引：10

作者：

Jia, Zhen ^{[1
,2
]}

Zhang, Zhang ^{[1
,2
]}

Shan, Caifeng ^{[3
]}

Wang, Liang ^{[1
,2
]}

Tan, Tieniu ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China

[2] CASIA, CRIPAC, MAIS, Beijing, Peoples R China

[3] Shandong Univ Sci & Technol, Coll Elect Engn & Automat, Qingdao, Peoples R China

来源：

NEUROCOMPUTING | 2023年 / 541卷

基金：

国家重点研发计划; 中国国家自然科学基金; 中国博士后科学基金;

关键词：

Zero-shot learning; Attention mechanism; Multi-head attention; Self-attention; Image classification;

D O I：

10.1016/j.neucom.2023.126264

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot learning aims to recognize image categories which are "unseen" in the training phase of image classification models. The key to this task is to transfer the learned knowledge from "seen" classes to "unseen" classes. In order to make the knowledge transfer process more effective, we propose to exploit both the visual and semantic attention mechanisms simultaneously in zero-shot learning tasks. Specifically, a dual-focus transfer network (DFTN) model is proposed to implement attention mechanisms from both the visual and semantic ends in a mapping based zero-shot learning framework with a visual focus transfer (VFT) module and a semantic focus transfer (SFT) module. The VFT module is composed by multi-head self-attention networks, which endows salient parts of images with greater weights at different resolutions of the feature maps. The SFT module generates semantic weights to re-weight semantic attribute features with the guidance of visual representations, where the semantic attributes corresponding to more visual discrimination capability will obtain greater weights. Extensive experiments of zero-shot learning and generalized zero-shot learning on five representative benchmarks demonstrate the superiority of the proposed DFTN model, compared to other state-of-the-art methods. (c) 2023 Elsevier B.V. All rights reserved.

引用

页数：13

共 69 条

[1] Multi-Cue Zero-Shot Learning with Strong Supervision [J].

Akata, Zeynep ;

Malinowski, Mateusz ;

Fritz, Mario ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :59-68

[2] Label-Embedding for Image Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438

[3]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[4] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[5] Preserving Semantic Relations for Zero-Shot Learning [J].

Annadani, Yashas ;

Biswas, Soma .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7603-7612

[6]

[Anonymous], 2013, NEURIPS

[7] Adaptive Confidence Smoothing for Generalized Zero-Shot Learning [J].

Atzmon, Yuval ;

Chechik, Gal .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11663-11672

[8] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].

BIEDERMAN, I .

PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147

[9] Synthesized Classifiers for Zero-Shot Learning [J].

Changpinyo, Soravit ;

Chao, Wei-Lun ;

Gong, Boqing ;

Sha, Fei .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5327-5336

[10] An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].

Chao, Wei-Lun ;

Changpinyo, Soravit ;

Gong, Boqing ;

Sha, Fei .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68

← 1 2 3 4 5 6 7 →