Integrating foreground-background feature distillation and contrastive feature learning for ultra-fine-grained visual classification

被引:4
作者
Chen, Qiupu [1 ,2 ]
Jiao, Lin [1 ,3 ]
Wang, Fenmei [1 ,2 ,4 ]
Du, Jianming [1 ]
Liu, Haiyun [1 ,2 ]
Wang, Xue [1 ]
Wang, Rujing [1 ,2 ]
机构
[1] Inst Intelligent Machines, Hefei Inst Phys Sci, Chinese Acad Sci, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, Grad Sch, Isl Branch, Hefei, Peoples R China
[3] Anhui Univ, Sch Internet, Hefei 230031, Anhui, Peoples R China
[4] PLA Army Acad Artillery & Air Def, Hefei 230031, Peoples R China
关键词
Ultra-fine-grained visual classification; Leaf cultivar identification; Self-supervised learning; Deep learning; Vision transformer;
D O I
10.1016/j.patcog.2024.110339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In pattern recognition, ultra -fine-grained visual classification (ultra-FGVC) has emerged as a paramount challenge, focusing on sub -category distinction within fine-grained objects. The near -indistinguishable similarities among such objects, combined with the dearth of sample data, intensify this challenge. In response, our FDCLDA method is introduced, which integrates Foreground-background feature Distillation (FD) and Contrastive feature Learning (CL) with Dual Augmentation (DA). This method uses two different data augmentation techniques, standard and auxiliary augmentation, to enhance model performance and generalization ability. The FD module reduces superfluous features and augments the contrast between the principal entity and its backdrop, while the CL focuses on creating unique data imprints by reducing intra-class resemblances and enhancing inter -class disparities. Integrating this method with different architectures, such as ResNet-50, Vision Transformer, and Swin-Transformer (Swin-T), significantly improves these backbone networks, especially when used with Swin-T, leading to promising results on eight popular datasets for ultra-FGVC tasks.1
引用
收藏
页数:11
相关论文
共 32 条
[1]   Diabetic retinopathy grading review: Current techniques and future directions [J].
Almattar, Wadha ;
Luqman, Hamzah ;
Khan, Fakhri Alam .
IMAGE AND VISION COMPUTING, 2023, 139
[2]  
Chen T, 2020, PR MACH LEARN RES, V119
[3]  
Chen XL, 2020, Arxiv, DOI arXiv:2003.04297
[4]   Solvothermal Synthesis of Size-Controlled Monodispersed Superparamagnetic Iron Oxide Nanoparticles [J].
Chen, Yongpeng ;
Zhang, Jianguo ;
Wang, Zhixin ;
Zhou, Zunning .
APPLIED SCIENCES-BASEL, 2019, 9 (23)
[5]   Attention-based Dropout Layer for Weakly Supervised Object Localization [J].
Choe, Junsuk ;
Shim, Hyunjung .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2214-2223
[6]  
DeVries T, 2017, Arxiv, DOI arXiv:1708.04552
[7]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8]  
Grill J., 2020, ADV NEURAL INFORM PR, V33, P21271
[9]  
He J, 2022, AAAI CONF ARTIF INTE, P852
[10]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90