Adversarial erasing attention for fine-grained image classification

被引:0
作者
Jinsheng Ji
Linfeng Jiang
Tao Zhang
Weilin Zhong
Huilin Xiong
机构
[1] Shanghai Jiao Tong University,Department of Automation, School of Electronic Information and Electrical Engineering
[2] Shanghai Jiao Tong University,Institute for Sensing and Navigation
来源
Multimedia Tools and Applications | 2021年 / 80卷
关键词
Fine-grained; Image classification; Multi-view; Visual attention; Adversarial erasing;
D O I
暂无
中图分类号
学科分类号
摘要
Recognizing fine-grained subcategories is a challenging task due to the large intra-class diversities and small inter-class variances of the fine-grained images. The common thought is to find out the parts that can distinguish similar subcategories efficiently. Most previous works rely on the manual annotations or attention technologies to localize the discriminative parts and have achieved great progress. However, these manual annotations are demanding in practical applications and some complicated constrains on the loss functions have to be adopted to localize the discriminative parts for building multi-view feature representations. To handle the challenges above, the strategy of adversarial erasing is applied on the attention module in this paper, which learns to localize different discriminative parts by erasing the most one from the image. Without the complicated loss functions, the proposed attention module can localize the discriminative parts more efficiently. Different from many part based methods, the classification network which consists of three subnetworks is introduced, and the subnetworks are trained by the original image and two discriminative parts respectively. Moreover, features learned from the three subnetworks are then fused in a more efficiently way to build better feature representations. Four mostly used datasets of CUB-200-2011, Stanford Dogs, Stanford Cars and FGVC-Aircraft are utilized to evaluate the proposed method and experimental results show that it can outperform some state-of-the-art methods without using the manual annotations.
引用
收藏
页码:22867 / 22889
页数:22
相关论文
共 91 条
[1]  
Azizpour H(2016)Factors of transferability for a generic convnet representation IEEE Trans Pattern Anal Mach Intell 38 1790-1802
[2]  
Razavian AS(2018)Fine-grained attention for image caption generation Multimed Tools Appl 77 2959-2971
[3]  
Sullivan J(2016)Robust transfer metric learning for image classification IEEE Trans Image Process 26 660-670
[4]  
Maki A(2014)Revisiting the fisher vector for fine-grained classification Pattern Recogn Lett 49 92-98
[5]  
Carlsson S(2016)Task-driven progressive part localization for fine-grained object recognition IEEE Trans Multimed 18 2372-2383
[6]  
Chang YS(2015)A comparison of dense region detectors for image search and fine-grained classification IEEE Trans Image Process 24 2369-2381
[7]  
Ding Z(2016)An ensemble of fine-tuned convolutional neural networks for medical image classification IEEE J Biomed Health Inf 21 31-40
[8]  
Fu Y(2018)Non-convex joint bilateral guided depth upsampling Multimed Tools Appl 77 15521-15544
[9]  
Gosselin PH(2019)Adaptive region proposal with channel regularization for robust object tracking IEEE Trans Circ Syst Video Technol 10 1-15
[10]  
Murray N(2019)Learning transform-aware attentive network for object tracking Neurocomputing 349 133-144