Learning Multipart Attention Neural Network for Zero-Shot Classification

被引:9
作者
Meng, Min [1 ]
Wei, Jie [1 ]
Wu, Jigang [1 ]
机构
[1] Guangdong Univ Technol, Dept Comp Sci, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Visualization; Neural networks; Training; Image recognition; Prototypes; Feature extraction; Attention mechanism; part annotations; visual recognition; zero-shot learning (ZSL);
D O I
10.1109/TCDS.2020.3044313
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning (ZSL) models typically learn a cross-modal mapping between the visual feature space and the semantic embedding space. Despite promising performance achieved by existing methods, they usually take visual features from the whole image as the main proposed inputs, while pay little attention to image regions which are relevant to human's visual response to the whole image. In this article, we propose a neural network-based ZSL model which incorporates an attention mechanism to discover the discriminative parts for each image. The proposed model allows us to automatically generate attention maps for visual parts, which provides a flexible way of encoding the salient visual aspects to distinguish the categories. Moreover, we introduce a simple yet effective objective function to exploit the pairwise label information between images and classes, resulting in substantial performance improvement. When multiple semantic spaces are available, a multiple-attention scheme is provided to fuse different semantic spaces, which helps to achieve further improvement in performance. On the widely used CUB-2010-2011 data set for fine-grained image classification, we demonstrate the advantages of using attention mechanism and semantic parts in our model for ZSL. Comprehensive experimental results show that our proposed approach achieves superior performance than the state-of-the-art methods.
引用
收藏
页码:414 / 423
页数:10
相关论文
共 51 条
  • [1] Multi-Cue Zero-Shot Learning with Strong Supervision
    Akata, Zeynep
    Malinowski, Mateusz
    Fritz, Mario
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 59 - 68
  • [2] Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
  • [3] Label-Embedding for Attribute-Based Classification
    Akata, Zeynep
    Perronnin, Florent
    Harchaoui, Zaid
    Schmid, Cordelia
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 819 - 826
  • [4] Preserving Semantic Relations for Zero-Shot Learning
    Annadani, Yashas
    Biswas, Soma
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7603 - 7612
  • [5] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions
    Ba, Jimmy Lei
    Swersky, Kevin
    Fidler, Sanja
    Salakhutdinov, Ruslan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4247 - 4255
  • [6] Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication
    Bucher, Maxime
    Herbin, Stephane
    Jurie, Frederic
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 730 - 746
  • [7] Synthesized Classifiers for Zero-Shot Learning
    Changpinyo, Soravit
    Chao, Wei-Lun
    Gong, Boqing
    Sha, Fei
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5327 - 5336
  • [8] Attention to Scale: Scale-aware Semantic Image Segmentation
    Chen, Liang-Chieh
    Yang, Yi
    Wang, Jiang
    Xu, Wei
    Yuille, Alan L.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3640 - 3649
  • [9] Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Liu, Wei
    Chang, Shih-Fu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1043 - 1052
  • [10] Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
    Elhoseiny, Mohamed
    Zhu, Yizhe
    Zhang, Han
    Elgammal, Ahmed
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6288 - 6297