Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

被引:113
作者
Carvalho, Micael [1 ]
Cadene, Remi [1 ]
Picard, David [1 ]
Soulier, Laure [1 ]
Thome, Nicolas [2 ]
Cord, Matthieu [1 ]
机构
[1] Sorbonne Univ, CNRS, LIP6, F-75005 Paris, France
[2] CEDRIC Conservatoire Natl Arts & Metiers, F-75003 Paris, France
来源
ACM/SIGIR PROCEEDINGS 2018 | 2018年
关键词
Deep Learning; Cross-modal Retrieval; Semantic Embeddings;
D O I
10.1145/3209978.3210036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.
引用
收藏
页码:35 / 44
页数:10
相关论文
共 42 条
[1]   Social Media Image Recognition for Food Trend Analysis [J].
Amato, Giuseppe ;
Bolettieri, Paolo ;
de Lira, Vinicius Monteiro ;
Muntean, Cristina Ioana ;
Perego, Raffaele ;
Renso, Chiara .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :1333-1336
[2]  
Andrew G., 2013, ICML, P1247
[3]  
[Anonymous], CEA2017 P 9 WORKSH M
[4]  
[Anonymous], 2002, Journal of machine learning research
[5]   Menu-Match: Restaurant-Specific Food Logging from Images [J].
Beijbom, Oscar ;
Joshi, Neel ;
Morris, Dan ;
Saponas, Scott ;
Khullar, Siddharth .
2015 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2015, :844-851
[6]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[7]   Cross-Modal Recipe Retrieval: How to Cook this Dish? [J].
Chen, Jingjing ;
Pang, Lei ;
Ngo, Chong-Wah .
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 :588-600
[8]  
Chen Jingjing, 2016, P 2016 ACMON MULT C
[9]   PFID: PITTSBURGH FAST-FOOD IMAGE DATASET [J].
Chen, Mei ;
Dhingra, Kapil ;
Wu, Wen ;
Yang, Lei ;
Sukthankar, Rahul ;
Yang, Jie .
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, :289-+
[10]   Exploiting Food Choice Biases for Healthier Recipe Recommendation [J].
Elsweiler, David ;
Trattner, Christoph ;
Harvey, Morgan .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :575-584