ESE-GAN: Zero-Shot Food Image Classification Based on Low Dimensional Embedding of Visual Features

被引:2
作者
Li, Gaojie [1 ]
Li, Yaochen [1 ]
Liu, Jingle [1 ]
Guo, Wei [1 ]
Tang, Wenneng [1 ]
Liu, Yuehu [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
关键词
Visualization; Semantics; Zero-shot learning; Generative adversarial networks; Training; Task analysis; Image classification; food classification; semantic feature; latent attributes;
D O I
10.1109/TMM.2024.3353457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing zero-shot learning based image classification methods transform the zero-shot learning problem into supervised learning by applying generative adversarial network (GAN) to synthesize visual features of unseen classes. However, the visual features generated by the generator tend to be biased towards seen classes, and the discriminator is too weak to generate high-quality image features. To solve these problems, we propose a novel zero-shot food image classification method based on low dimensional embedding of visual features. Our method applies reinforced semantic guidance to increase the discriminative ability of the model by enhancing the strong distribution of input features. Moreover, the visual space is utilized as the embedding space to reduce the bias towards seen classes by reducing the distance between semantic information and visual features in the embedding space. Finally, the feature distribution of unseen classes is further specified by improving the prototype similarity function. Extensive experiments on three food datasets and four general benchmark datasets demonstrate the effectiveness of the proposed method.
引用
收藏
页码:2713 / 2723
页数:11
相关论文
共 59 条
[1]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[2]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[3]   Menu-Match: Restaurant-Specific Food Logging from Images [J].
Beijbom, Oscar ;
Joshi, Neel ;
Morris, Dan ;
Saponas, Scott ;
Khullar, Siddharth .
2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, :844-851
[4]   Cross-modal Recipe Retrieval with Rich Food Attributes [J].
Chen, Jing-Jing ;
Ngo, Chong-Wah ;
Chua, Tat-Seng .
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, :1771-1779
[5]   Deep-based Ingredient Recognition for Cooking Recipe Retrieval [J].
Chen, Jingjing ;
Ngo, Chong-Wah .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :32-41
[6]   PFID: PITTSBURGH FAST-FOOD IMAGE DATASET [J].
Chen, Mei ;
Dhingra, Kapil ;
Wu, Wen ;
Yang, Lei ;
Sukthankar, Rahul ;
Yang, Jie .
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, :289-+
[7]   MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning [J].
Chen, Shiming ;
Hong, Ziming ;
Xie, Guo-Sen ;
Yang, Wenhan ;
Peng, Qinmu ;
Wang, Kai ;
Zhao, Jian ;
You, Xinge .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :7602-7611
[8]   Mixed Dish Recognition With Contextual Relation and Domain Alignment [J].
Deng, Lixi ;
Chen, Jingjing ;
Ngo, Chong-Wah ;
Sun, Qianru ;
Tang, Sheng ;
Zhang, Yongdong ;
Chua, Tat-Seng .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2034-2045
[9]   Multi-modal Cycle-Consistent Generalized Zero-Shot Learning [J].
Felix, Rafael ;
Kumar, B. G. Vijay ;
Reid, Ian ;
Carneiro, Gustavo .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :21-37
[10]  
Frome A., 2013, Advances in neural information processing systems