Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引:10
作者
Cao, Da [1 ]
Chu, Jingjing [1 ]
Zhu, Ningbo [1 ]
Nie, Liqiang [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;
D O I
10.1016/j.knosys.2019.105428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 52 条
  • [1] [Anonymous], 2018, P 2018 ACM MULT C, DOI DOI 10.1145/3240508.3240627
  • [2] [Anonymous], KNOWL BASED SYST
  • [3] [Anonymous], P INT C LEAR REPR
  • [4] Attentive Group Recommendation
    Cao, Da
    He, Xiangnan
    Miao, Lianhai
    An, Yahui
    Yang, Chao
    Hong, Richang
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 645 - 654
  • [5] Embedding Factorization Models for Jointly Recommending Items and User Generated Lists
    Cao, Da
    Nie, Liqiang
    He, Xiangnan
    Wei, Xiaochi
    Zhu, Shunzhi
    Chua, Tat-Seng
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 585 - 594
  • [6] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
    Carvalho, Micael
    Cadene, Remi
    Picard, David
    Soulier, Laure
    Thome, Nicolas
    Cord, Matthieu
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 35 - 44
  • [7] DeepPIM: A deep neural point-of-interest imputation model
    Chang, Buru
    Park, Yonggyu
    Kim, Seongsoon
    Kang, Jaewoo
    [J]. INFORMATION SCIENCES, 2018, 465 : 61 - 71
  • [8] Cross-modal Recipe Retrieval with Rich Food Attributes
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1771 - 1779
  • [9] Deep-based Ingredient Recognition for Cooking Recipe Retrieval
    Chen, Jingjing
    Ngo, Chong-Wah
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 32 - 41
  • [10] Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention
    Chen, Jingyuan
    Zhang, Hanwang
    He, Xiangnan
    Nie, Liqiang
    Liu, Wei
    Chua, Tat-Seng
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 335 - 344