Multi-Scale CNN for Fine-Grained Image Recognition

被引:25
作者
Won, Chee Sun [1 ]
机构
[1] Dongguk Univ, Dept Elect & Elect Engn, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Convolutional neural network (CNN); fine-grained image classification; food recognition; image resizing; MODEL;
D O I
10.1109/ACCESS.2020.3005150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most conventional fine-grained image recognitions are based on a two-stream model of object-level and part-level CNNs, where the part-level CNN is responsible for learning the object-parts and their spatial relationships. To train the part-level CNN, we first need to separate parts from an object. However, there exist sub-level objects with no distinctive and separable parts. In this paper, a multi-scale CNN with a baseline Object-level and multiple Part-level CNNs is proposed for the fine-grained image recognition with no separable object-parts. The basic idea to train different CNNs of the multi-scale CNNs is to adopt different scales in resizing the training images. That is, the training images are resized such that the entire object appears as much as possible for the Object-level CNN, while only a local part of the object is to be included for the Part-level CNN. This scale-specific image resizing approach requires a scale-controllable parameter in the image resizing process. In this paper, a scale-controllable parameter is introduced for the linear-scaling and random-cropping method. Also, a line-based image resizing method with a scale-controllable parameter is employed for the part-level CNNs. The proposed multi-scale CNN is applied to a food image classification, which belongs to a fine-grained classification problem with no separable object-parts. Experimental results on the public food image datasets show that the classification accuracy improves substantially when the predicted scores of the multi-scale CNN are fused together. This reveals that the object-level and part-level CNNs work harmoniously in differentiating subtle differences of the sub-level objects.
引用
收藏
页码:116663 / 116674
页数:12
相关论文
共 39 条
[1]   Class-Conditional Data Augmentation Applied to Image Classification [J].
Aguilar, Eduardo ;
Radeva, Petia .
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 :182-192
[2]   Regularized uncertainty-based multi-task learning model for food analysis [J].
Aguilar, Eduardo ;
Bolanos, Marc ;
Radeva, Petia .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 60 :360-370
[3]  
Ba J., 2014, Multiple object recognition with visual attention
[4]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[5]   Deep-based Ingredient Recognition for Cooking Recipe Retrieval [J].
Chen, Jingjing ;
Ngo, Chong-Wah .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :32-41
[6]   Destruction and Construction Learning for Fine-grained Image Recognition [J].
Chen, Yue ;
Bai, Yalong ;
Zhang, Wei ;
Mei, Tao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161
[7]   CNN-based features for retrieval and classification of food images [J].
Ciocca, Gianluigi ;
Napoletano, Paolo ;
Schettini, Raimondo .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 176 :70-77
[8]   P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization [J].
Han, Junwei ;
Yao, Xiwen ;
Cheng, Gong ;
Feng, Xiaoxu ;
Xu, Dong .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) :579-590
[9]   Food Image Recognition Using Very Deep Convolutional Networks [J].
Hassannejad, Hamid ;
Matrella, Guido ;
Ciampolini, Paolo ;
De Munari, Ilaria ;
Mordonini, Monica ;
Cagnoni, Stefano .
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, 2016, :41-49
[10]  
He K., 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]