A survey of fine-grained visual categorization based on deep learning

被引：0

作者：

Xie Yuxiang ^{[1
]}

Gong Quanzhi ^{[1
]}

Luan Xidao ^{[2
]}

Yan Jie ^{[1
]}

Zhang Jiahui ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Syst Engn, Changsha 410000, Peoples R China

[2] Changsha Univ, Coll Comp Engn & Appl Math, Changsha 410003, Peoples R China

来源：

JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS | 2023年

基金：

中国国家自然科学基金;

关键词：

deep learning; fine-grained visual categorization; convolutional neural network (CNN); visual attention; ATTENTION; NETWORK;

D O I：

10.23919/JSEE.2022.000155

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning has achieved excellent results in various tasks in the field of computer vision, especially in fine-grained visual categorization. It aims to distinguish the subordinate categories of the label-level categories. Due to high intra-class variances and high inter-class similarity, the fine-grained visual categorization is extremely challenging. This paper first briefly introduces and analyzes the related public datasets. After that, some of the latest methods are reviewed. Based on the feature types, the feature processing methods, and the overall structure used in the model, we divide them into three types of methods: methods based on general convolutional neural network (CNN) and strong supervision of parts, methods based on single feature processing, and methods based on multiple feature processing. Most methods of the first type have a relatively simple structure, which is the result of the initial research. The methods of the other two types include models that have special structures and training processes, which are helpful to obtain discriminative features. We conduct a specific analysis on several methods with high accuracy on public datasets. In addition, we support that the focus of the future research is to solve the demand of existing methods for the large amount of the data and the computing power. In terms of technology, the extraction of the subtle feature information with the burgeoning vision transformer (ViT) network is also an important research direction.

引用

页数：20

共 95 条

[1] Beery S, 2021, Arxiv, DOI arXiv:2105.03494
[2] Beery S, 2019, Arxiv, DOI arXiv:1904.05986
[3] Behera A, 2021, Arxiv, DOI arXiv:2101.06635
[4] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934, 10.48550/arXiv.2004.10934]
[5] The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification
Chang, Dongliang
Ding, Yifeng
Xie, Jiyang
Bhunia, Ayan Kumar
Li, Xiaoxu
Ma, Zhanyu
Wu, Ming
Guo, Jun
Song, Yi-Zhe
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4683 - 4695
[6] Selective Sparse Sampling for Fine-grained Image Recognition
Ding, Yao
Zhou, Yanzhao
Zhu, Yi
Ye, Qixiang
Jiao, Jianbin
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6598 - 6607
[7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8] Fan Zhang, 2021, MultiMedia Modeling. 27th International Conference, MMM 2021. Proceedings. Lecture Notes in Computer Science (LNCS 12572), P136, DOI 10.1007/978-3-030-67832-6_12
[9] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
Fu, Jianlong
Zheng, Heliang
Mei, Tao
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4476 - 4484
[10] Gao Y, 2020, AAAI CONF ARTIF INTE, V34, P10818

← 1 2 3 4 5 6 7 8 9 10 →