Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

被引:102
作者
Carvalho, Micael [1 ]
Cadene, Remi [1 ]
Picard, David [1 ]
Soulier, Laure [1 ]
Thome, Nicolas [2 ]
Cord, Matthieu [1 ]
机构
[1] Sorbonne Univ, CNRS, LIP6, F-75005 Paris, France
[2] CEDRIC Conservatoire Natl Arts & Metiers, F-75003 Paris, France
来源
ACM/SIGIR PROCEEDINGS 2018 | 2018年
关键词
Deep Learning; Cross-modal Retrieval; Semantic Embeddings;
D O I
10.1145/3209978.3210036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.
引用
收藏
页码:35 / 44
页数:10
相关论文
共 42 条
  • [1] Social Media Image Recognition for Food Trend Analysis
    Amato, Giuseppe
    Bolettieri, Paolo
    de Lira, Vinicius Monteiro
    Muntean, Cristina Ioana
    Perego, Raffaele
    Renso, Chiara
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1333 - 1336
  • [2] Andrew G., 2013, ICML, P1247
  • [3] [Anonymous], CEA2017 P 9 WORKSH M
  • [4] [Anonymous], 2002, Journal of machine learning research
  • [5] Menu-Match: Restaurant-Specific Food Logging from Images
    Beijbom, Oscar
    Joshi, Neel
    Morris, Dan
    Saponas, Scott
    Khullar, Siddharth
    [J]. 2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, : 844 - 851
  • [6] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [7] Cross-Modal Recipe Retrieval: How to Cook this Dish?
    Chen, Jingjing
    Pang, Lei
    Ngo, Chong-Wah
    [J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 588 - 600
  • [8] Chen Jingjing, 2016, P 2016 ACMON MULT C
  • [9] PFID: PITTSBURGH FAST-FOOD IMAGE DATASET
    Chen, Mei
    Dhingra, Kapil
    Wu, Wen
    Yang, Lei
    Sukthankar, Rahul
    Yang, Jie
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 289 - +
  • [10] Exploiting Food Choice Biases for Healthier Recipe Recommendation
    Elsweiler, David
    Trattner, Christoph
    Harvey, Morgan
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 575 - 584