Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引:10
作者
Cao, Da [1 ]
Chu, Jingjing [1 ]
Zhu, Ningbo [1 ]
Nie, Liqiang [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;
D O I
10.1016/j.knosys.2019.105428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 52 条
[1]  
[Anonymous], KNOWL BASED SYST
[2]  
[Anonymous], P INT C LEAR REPR
[3]   Attentive Group Recommendation [J].
Cao, Da ;
He, Xiangnan ;
Miao, Lianhai ;
An, Yahui ;
Yang, Chao ;
Hong, Richang .
ACM/SIGIR PROCEEDINGS 2018, 2018, :645-654
[4]   Embedding Factorization Models for Jointly Recommending Items and User Generated Lists [J].
Cao, Da ;
Nie, Liqiang ;
He, Xiangnan ;
Wei, Xiaochi ;
Zhu, Shunzhi ;
Chua, Tat-Seng .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :585-594
[5]   Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [J].
Carvalho, Micael ;
Cadene, Remi ;
Picard, David ;
Soulier, Laure ;
Thome, Nicolas ;
Cord, Matthieu .
ACM/SIGIR PROCEEDINGS 2018, 2018, :35-44
[6]   DeepPIM: A deep neural point-of-interest imputation model [J].
Chang, Buru ;
Park, Yonggyu ;
Kim, Seongsoon ;
Kang, Jaewoo .
INFORMATION SCIENCES, 2018, 465 :61-71
[7]  
Chen J.-J., 2018, P 2018 ACM MULT C, P1020, DOI DOI 10.1145/3240508.3240627
[8]   Cross-modal Recipe Retrieval with Rich Food Attributes [J].
Chen, Jing-Jing ;
Ngo, Chong-Wah ;
Chua, Tat-Seng .
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, :1771-1779
[9]   Deep-based Ingredient Recognition for Cooking Recipe Retrieval [J].
Chen, Jingjing ;
Ngo, Chong-Wah .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :32-41
[10]   Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention [J].
Chen, Jingyuan ;
Zhang, Hanwang ;
He, Xiangnan ;
Nie, Liqiang ;
Liu, Wei ;
Chua, Tat-Seng .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :335-344