Cross-domain Cross-modal Food Transfer

被引:5
作者
Zhu, Bin [1 ]
Ngo, Chong-Wah [1 ]
Chen, Jing-jing [2 ]
机构
[1] City Univ Hong Kong, Hong Kong, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
关键词
food recognition; cross-modal food retrieval; cross-domain transfer;
D O I
10.1145/3394171.3413809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs for model transfer can be time-consuming. This paper addresses the challenge of resource scarcity in the scenario that only partial data instead of a complete view of data is accessible for model transfer. Partial data refers to missing information such as absence of image modality or cooking instructions from an image-recipe pair. To cope with partial data, a novel generic model, equipped with various loss functions including cross-modal metric learning, recipe residual loss, semantic regularization and adversarial learning, is proposed for cross-domain transfer learning. Experiments are conducted on three different cuisines (Chuan, Yue and Washoku) to provide insights on scaling up food recognition across domains with limited training resources.
引用
收藏
页码:3762 / 3770
页数:9
相关论文
共 37 条
  • [1] [Anonymous], 2019, 2019 IEEE INT S CIRC
  • [2] [Anonymous], P 2018 ACM MULT C
  • [3] [Anonymous], 2018, IEEE transactions on cybernetics
  • [4] [Anonymous], 2017, INT C IM AN PROC SPR, DOI DOI 10.1007/978-3-319-70742-6_37
  • [5] [Anonymous], 2020, COMMUN ACM, DOI DOI 10.1145/3422622
  • [6] Blitzer J., 2007, P ACL, P440
  • [7] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [8] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
    Carvalho, Micael
    Cadene, Remi
    Picard, David
    Soulier, Laure
    Thome, Nicolas
    Cord, Matthieu
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 35 - 44
  • [9] Learning Aligned Cross-Modal Representations from Weakly Aligned Data
    Castrejon, Lluis
    Aytar, Yusuf
    Vondrick, Carl
    Pirsiavash, Hamed
    Torralba, Antonio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2940 - 2949
  • [10] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Feng, Fu-Li
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1020 - 1028