ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

被引:56
|
作者
Min, Weiqing [1 ,2 ]
Liu, Linhu [1 ,2 ]
Wang, Zhiling [1 ,2 ]
Luo, Zhengdong [1 ,2 ]
Wei, Xiaoming [3 ]
Wei, Xiaolin [3 ]
Jiang, Shuqiang [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Meituan Dianping Grp, Hong Kong, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
基金
中国国家自然科学基金;
关键词
Food Recognition; Food Datasets; Benchmark; Deep Learning;
D O I
10.1145/3394171.3414031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food-500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One sub-network first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.
引用
收藏
页码:393 / 401
页数:9
相关论文
共 11 条
  • [1] Gait recognition via weighted global-local feature fusion and attention-based multiscale temporal aggregation
    Xu, Yingqi
    Xi, Hao
    Ren, Kai
    Zhu, Qiyuan
    Hu, Chuanping
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [2] Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network
    Liu Y.-X.
    Min W.-Q.
    Jiang S.-Q.
    Rui Y.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (11): : 4379 - 4395
  • [3] Automatic Sleep Staging Based on a Hybrid Stacked LSTM Neural Network: Verification Using Large-Scale Dataset
    Kuo, Chih-En
    Chen, Guan-Ting
    IEEE ACCESS, 2020, 8 : 111837 - 111849
  • [4] MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
    Rao, Yao
    Li, Chaofeng
    Xu, Feiran
    Guo, Ya
    JOURNAL OF FOOD MEASUREMENT AND CHARACTERIZATION, 2024, 18 (11) : 9233 - 9251
  • [5] DE-Net: Detail-enhanced MR reconstruction network via global-local dependent attention
    Zhu, Jiali
    Hu, Dianlin
    Mao, Weilong
    Zhu, Jianfeng
    Hu, Rihan
    Chen, Yang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 95
  • [6] Large-scale seasonal forecasts of river discharge by coupling local and global datasets with a stacked neural network: Case for the Loire River system
    Vu, M. T.
    Jardani, A.
    Krimissa, M.
    Zaoui, F.
    Massei, N.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2023, 897
  • [7] Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset
    Liu, Lei
    Zhang, Mengya
    Li, Cheng
    Li, Chenglong
    Tang, Jin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [8] GGLA-NeXtE2NET: A Dual-Branch Ensemble Network With Gated Global-Local Attention for Enhanced Brain Tumor Recognition
    Saeed, Adnan
    Shehzad, Khurram
    Bhatti, Shahzad Sarwar
    Ahmed, Saim
    Azar, Ahmad Taher
    IEEE ACCESS, 2025, 13 : 7234 - 7257
  • [9] Receptive-Field and Direction Induced Attention Network for Infrared Dim Small Target Detection With a Large-Scale Dataset IRDST
    Sun, Heng
    Bai, Junxiang
    Yang, Fan
    Bai, Xiangzhi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [10] ISPANet: A Pyramid Self-Attention Network for Single-Frame High-Resolution Infrared Small Target Detection With a Large-Scale Dataset SHR-IRST
    Wang, Wenjing
    Xiao, Chengwang
    Dou, Haofeng
    Liang, Ruixiang
    Yuan, Huaibin
    Zhao, Guanghui
    Chen, Zhiwei
    Huang, Yuhang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 11146 - 11162