Learning Multipart Attention Neural Network for Zero-Shot Classification

被引:9
作者
Meng, Min [1 ]
Wei, Jie [1 ]
Wu, Jigang [1 ]
机构
[1] Guangdong Univ Technol, Dept Comp Sci, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Visualization; Neural networks; Training; Image recognition; Prototypes; Feature extraction; Attention mechanism; part annotations; visual recognition; zero-shot learning (ZSL);
D O I
10.1109/TCDS.2020.3044313
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning (ZSL) models typically learn a cross-modal mapping between the visual feature space and the semantic embedding space. Despite promising performance achieved by existing methods, they usually take visual features from the whole image as the main proposed inputs, while pay little attention to image regions which are relevant to human's visual response to the whole image. In this article, we propose a neural network-based ZSL model which incorporates an attention mechanism to discover the discriminative parts for each image. The proposed model allows us to automatically generate attention maps for visual parts, which provides a flexible way of encoding the salient visual aspects to distinguish the categories. Moreover, we introduce a simple yet effective objective function to exploit the pairwise label information between images and classes, resulting in substantial performance improvement. When multiple semantic spaces are available, a multiple-attention scheme is provided to fuse different semantic spaces, which helps to achieve further improvement in performance. On the widely used CUB-2010-2011 data set for fine-grained image classification, we demonstrate the advantages of using attention mechanism and semantic parts in our model for ZSL. Comprehensive experimental results show that our proposed approach achieves superior performance than the state-of-the-art methods.
引用
收藏
页码:414 / 423
页数:10
相关论文
共 51 条
[41]  
Wah C., 2011, CALTECH UCSD BIRDS 2
[42]   Zero-Shot Image Classification Based on Deep Feature Extraction [J].
Wang, Xuesong ;
Chen, Chen ;
Cheng, Yuhu ;
Wang, Z. Jane .
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2018, 10 (02) :432-444
[43]   Latent Embeddings for Zero-shot Classification [J].
Xian, Yongqin ;
Akata, Zeynep ;
Sharma, Gaurav ;
Nguyen, Quynh ;
Hein, Matthias ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :69-77
[44]  
Xiao TJ, 2015, PROC CVPR IEEE, P842, DOI 10.1109/CVPR.2015.7298685
[45]  
Xu K, 2015, PR MACH LEARN RES, V37, P2048
[46]   Multibranch Attention Networks for Action Recognition in Still Images [J].
Yan, Shiyang ;
Smith, Jeremy S. ;
Lu, Wenjin ;
Zhang, Bailing .
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2018, 10 (04) :1116-1125
[47]   Stacked Attention Networks for Image Question Answering [J].
Yang, Zichao ;
He, Xiaodong ;
Gao, Jianfeng ;
Deng, Li ;
Smola, Alex .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :21-29
[48]   SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-grained Recognition [J].
Zhang, Han ;
Xu, Tao ;
Elhoseiny, Mohamed ;
Huang, Xiaolei ;
Zhang, Shaoting ;
Elgammal, Ahmed ;
Metaxas, Dimitris .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1143-1152
[49]   Triple Verification Network for Generalized Zero-Shot Learning [J].
Zhang, Haofeng ;
Long, Yang ;
Guan, Yu ;
Shao, Ling .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (01) :506-517
[50]   Zero-Shot Learning via Joint Latent Similarity Embedding [J].
Zhang, Ziming ;
Saligrama, Venkatesh .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :6034-6042