Attention Guided Food Recognition via Multi-Stage Local Feature Fusion

被引：1

作者：

Deng, Gonghui ^{[1
]}

Wu, Dunzhi ^{[1
]}

Chen, Weizhen ^{[1
]}

机构：

[1] Wuhan Polytech Univ, Sch Elect & Elect Engn, Wuhan 430048, Peoples R China

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 80卷 / 02期

关键词：

Fine-grained image recognition; food image recognition; attention mechanism; local feature fusion;

D O I：

10.32604/cmc.2024.052174

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The task of food image recognition, a nuanced subset of fine-grained image recognition, grapples with substantial intra-class variation and minimal inter-class differences. These challenges are compounded by the irregular and multi-scale nature of food images. Addressing these complexities, our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion, grounded in the ConvNeXt architecture. Our model employs hybrid attention (HA) mechanisms to pinpoint critical discriminative regions within images, substantially mitigating the influence of background noise. Furthermore, it introduces a multi-stage local fusion (MSLF) module, fostering long-distance dependencies between feature maps at varying stages. This approach facilitates the assimilation of complementary features across scales, significantly bolstering the model's capacity for feature extraction. Furthermore, we constructed a dataset named Roushi60, which consists of 60 different categories of common meat dishes. Empirical evaluation of the ETH Food-101, ChineseFoodNet, and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%, 82.86%, and 92.50%, respectively. These figures not only mark an improvement of 1.04%, 3.42%, and 1.36% over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods. Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition, setting a new benchmark for the field.

引用

页码：1985 / 2003

页数：19

共 50 条

[1]

Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29

[2] Spatial Memory for Context Reasoning in Object Detection [J].

Chen, Xinlei ;

Gupta, Abhinav .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4106-4116

[3] Destruction and Construction Learning for Fine-grained Image Recognition [J].

Chen, Yue ;

Bai, Yalong ;

Zhang, Wei ;

Mei, Tao .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161

[4] State Recognition of Food Images Using Deep Features [J].

Ciocca, Gianluigi ;

Micali, Giovanni ;

Napoletano, Paolo .

IEEE ACCESS, 2020, 8 :32003-32017

[5] The multi-learning for food analyses in computer vision: a survey [J].

Dai, Jingzhao ;

Hu, Xuejiao ;

Li, Ming ;

Li, Yang ;

Du, Sidan .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (17) :25615-25650

[6]

Dubey A, 2018, ADV NEUR IN, V31

[7] Image-Based Food Calorie Estimation Using Knowledge on Food Categories, Ingredients and Cooking Directions [J].

Ege, Takumi ;

Yanai, Keiji .

PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :367-375

[8] A fine-grained recognition technique for identifying Chinese food images [J].

Feng, Shuo ;

Wang, Yangang ;

Gong, Jianhong ;

Li, Xiang ;

Li, Shangxuan .

HELIYON, 2023, 9 (11)

[9] Food Image Recognition Using Very Deep Convolutional Networks [J].

Hassannejad, Hamid ;

Matrella, Guido ;

Ciampolini, Paolo ;

De Munari, Ilaria ;

Mordonini, Monica ;

Cagnoni, Stefano .

MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, 2016, :41-49

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 5 →