Cross-domain Cross-modal Food Transfer

被引：5

作者：

Zhu, Bin ^{[1
]}

Ngo, Chong-Wah ^{[1
]}

Chen, Jing-jing ^{[2
]}

机构：

[1] City Univ Hong Kong, Hong Kong, Peoples R China

[2] Fudan Univ, Shanghai, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

food recognition; cross-modal food retrieval; cross-domain transfer;

D O I：

10.1145/3394171.3413809

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs for model transfer can be time-consuming. This paper addresses the challenge of resource scarcity in the scenario that only partial data instead of a complete view of data is accessible for model transfer. Partial data refers to missing information such as absence of image modality or cooking instructions from an image-recipe pair. To cope with partial data, a novel generic model, equipped with various loss functions including cross-modal metric learning, recipe residual loss, semantic regularization and adversarial learning, is proposed for cross-domain transfer learning. Experiments are conducted on three different cuisines (Chuan, Yue and Washoku) to provide insights on scaling up food recognition across domains with limited training resources.

引用

页码：3762 / 3770

页数：9

共 37 条

[1] [Anonymous], 2019, 2019 IEEE INT S CIRC
[2] [Anonymous], P 2018 ACM MULT C
[3] [Anonymous], 2018, IEEE transactions on cybernetics
[4] [Anonymous], 2017, INT C IM AN PROC SPR, DOI DOI 10.1007/978-3-319-70742-6_37
[5] [Anonymous], 2020, COMMUN ACM, DOI DOI 10.1145/3422622
[6] Blitzer J., 2007, P ACL, P440
[7] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[8] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
Carvalho, Micael
Cadene, Remi
Picard, David
Soulier, Laure
Thome, Nicolas
Cord, Matthieu
[J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 35 - 44
[9] Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Castrejon, Lluis
Aytar, Yusuf
Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2940 - 2949
[10] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
Chen, Jing-Jing
Ngo, Chong-Wah
Feng, Fu-Li
Chua, Tat-Seng
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1020 - 1028

← 1 2 3 4 →