Enhancing Food Image Recognition by Multi-Level Fusion and the Attention Mechanism

被引:3
作者
Chen, Zengzheng [1 ]
Wang, Jianxin [1 ]
Wang, Yeru [2 ]
机构
[1] Beijing Forestry Univ, Sch Informat, Beijing 100083, Peoples R China
[2] China Natl Ctr Food Safety Risk Assessment, Risk Assessment Div 1, Beijing 100022, Peoples R China
基金
中国国家自然科学基金;
关键词
convolutional neural network; food recognition; self-attention mechanism; feature fusion; food science and technology; CLASSIFICATION; MODEL;
D O I
10.3390/foods14030461
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
As a pivotal area of research in the field of computer vision, the technology for food identification has become indispensable across diverse domains including dietary nutrition monitoring, intelligent service provision in restaurants, and ensuring quality control within the food industry. However, recognizing food images falls within the domain of Fine-Grained Visual Classification (FGVC), which presents challenges such as inter-class similarity, intra-class variability, and the complexity of capturing intricate local features. Researchers have primarily focused on deep information in deep convolutional neural networks for fine-grained visual classification, often neglecting shallow and detailed information. Taking these factors into account, we propose a Multi-level Attention Feature Fusion Network (MAF-Net). Specifically, we use feature maps generated by the Convolutional Neural Networks (CNNs) backbone network at different stages as inputs. We apply a self-attention mechanism to identify local features on these feature maps and then stack them together. The feature vectors obtained through the attention mechanism are then integrated with the original input to enhance data augmentation. Simultaneously, to capture as many local features as possible, we encourage multi-scale features to concentrate on distinct local regions at each stage by maximizing the Kullback-Leibler Divergence (KL-divergence) between the different stages. Additionally, we present a novel approach called subclass center loss (SCloss) to implement label smoothing, minimize intra-class feature distribution differences, and enhance the model's generalization capability. Experiments conducted on three food image datasets-CETH Food-101, Vireo Food-172, and UEC Food-100-demonstrated the superiority of the proposed model. The model achieved Top-1 accuracies of 90.22%, 89.86%, and 90.61% on CETH Food-101, Vireo Food-172, and UEC Food-100, respectively. Notably, our method not only outperformed other methods in terms of the Top-5 accuracy of Vireo Food-172 but also achieved the highest performance in the Top-1 accuracies of UEC Food-100.
引用
收藏
页数:21
相关论文
共 64 条
[1]   Uncertainty-aware selecting for an ensemble of deep food recognition models [J].
Aguilar E. ;
Nagarajan B. ;
Radeva P. .
Computers in Biology and Medicine, 2022, 146
[2]   Grab, Pay, and Eat: Semantic Food Detection for Smart Restaurants [J].
Aguilar, Eduardo ;
Remeseiro, Bealriz ;
Bolanos, Marc ;
Radeva, Petia .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (12) :3266-3275
[3]  
Arslan Berker, 2022, IEEE Transactions on Artificial Intelligence, V3, P238, DOI [10.1109/tai.2021.3108126, 10.1109/TAI.2021.3108126]
[4]  
Bossard L., 2014, Food-101Mining Discriminative Components with Random Forests, DOI 10.1007978-3-319-10599-429
[5]   Deep-based Ingredient Recognition for Cooking Recipe Retrieval [J].
Chen, Jingjing ;
Ngo, Chong-Wah .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :32-41
[6]   PFID: PITTSBURGH FAST-FOOD IMAGE DATASET [J].
Chen, Mei ;
Dhingra, Kapil ;
Wu, Wen ;
Yang, Lei ;
Sukthankar, Rahul ;
Yang, Jie .
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, :289-+
[7]   Destruction and Construction Learning for Fine-grained Image Recognition [J].
Chen, Yue ;
Bai, Yalong ;
Zhang, Wei ;
Mei, Tao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161
[8]   Food Recognition: A New Dataset, Experiments, and Results [J].
Ciocca, Gianluigi ;
Napoletano, Paolo ;
Schettini, Raimondo .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (03) :588-598
[9]  
Dalal N., 2025, P 2005 IEEE COMPUTER, V1 886 893
[10]   Fine-Grained Visual Classification via Progressive Multi-granularity Training of Jigsaw Patches [J].
Du, Ruoyi ;
Chang, Dongliang ;
Bhunia, Ayan Kumar ;
Xie, Jiyang ;
Ma, Zhanyu ;
Song, Yi-Zhe ;
Guo, Jun .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :153-168