Wide-Slice Residual Networks for Food Recognition

被引:141
作者
Martinel, Niki [1 ]
Foresti, Than Luca [2 ]
Micheloni, Christian [1 ]
机构
[1] Univ Udine, Machine Learning Percept Lab, Udine, Italy
[2] Univ Udine, AViReS Lab, Udine, Italy
来源
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018) | 2018年
关键词
DIETARY ASSESSMENT;
D O I
10.1109/WACV.2018.00068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-based food recognition pose new challenges for mainstream computer vision algorithms. Recent works in the field focused either on hand-crafted representations or on learning these by exploiting deep neural networks (DNN). Despite the success of DNN-based works, these exploit off-the-shelf deep architectures which are not cast to the specific food classification problem. We believe that better results can be obtained if the architecture is defined with respect to an analysis of the food composition. Following such an intuition, this work introduces a new deep scheme that is designed to handle the food structure. In particular, we focus on the vertical food traits that are common to a large number of categories (i.e., 15% of the whole data in current datasets). Towards the final objective, we first introduce a slice convolution block to capture such specific information. Then, we leverage on the recent success of deep residual blocks and combine those with the sliced convolution to produce the classification score. Extensive evaluations on three benchmark datasets demonstrated that our solution has better performance than existing approaches (e.g., a top-1 accuracy of 90.27% on the Food-101 dataset).
引用
收藏
页码:567 / 576
页数:10
相关论文
共 48 条
[1]  
[Anonymous], INFORM SECURITY CULT
[2]  
[Anonymous], WINT C APPL COMP VIS
[3]  
[Anonymous], 2015, 150500387 ARXIV
[4]  
[Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]
[5]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[6]  
Chenoweth JM, 2016, FLA MUS NAT HIST-RIP, P1
[7]  
Deng Z., 2016, INT C COMP VIS PATT
[8]  
Farinella G. M., 2014, EUR C COMP VIS WORKS
[9]  
Farinella GM, 2014, IEEE IMAGE PROC, P5212, DOI 10.1109/ICIP.2014.7026055
[10]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941