Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引：10

作者：

Cao, Da ^{[1
]}

Chu, Jingjing ^{[1
]}

Zhu, Ningbo ^{[1
]}

Nie, Liqiang ^{[2
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2020年 / 193卷

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;

D O I：

10.1016/j.knosys.2019.105428

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.

引用

页数：12

共 52 条

[1]

[Anonymous], KNOWL BASED SYST

[2]

[Anonymous], P INT C LEAR REPR

[3] Attentive Group Recommendation [J].

Cao, Da ;

He, Xiangnan ;

Miao, Lianhai ;

An, Yahui ;

Yang, Chao ;

Hong, Richang .

ACM/SIGIR PROCEEDINGS 2018, 2018, :645-654

[4] Embedding Factorization Models for Jointly Recommending Items and User Generated Lists [J].

Cao, Da ;

Nie, Liqiang ;

He, Xiangnan ;

Wei, Xiaochi ;

Zhu, Shunzhi ;

Chua, Tat-Seng .

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :585-594

[5] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [J].

Carvalho, Micael ;

Cadene, Remi ;

Picard, David ;

Soulier, Laure ;

Thome, Nicolas ;

Cord, Matthieu .

ACM/SIGIR PROCEEDINGS 2018, 2018, :35-44

[6] DeepPIM: A deep neural point-of-interest imputation model [J].

Chang, Buru ;

Park, Yonggyu ;

Kim, Seongsoon ;

Kang, Jaewoo .

INFORMATION SCIENCES, 2018, 465 :61-71

[7]

Chen J.-J., 2018, P 2018 ACM MULT C, P1020, DOI DOI 10.1145/3240508.3240627

[8] Cross-modal Recipe Retrieval with Rich Food Attributes [J].

Chen, Jing-Jing ;

Ngo, Chong-Wah ;

Chua, Tat-Seng .

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, :1771-1779

[9] Deep-based Ingredient Recognition for Cooking Recipe Retrieval [J].

Chen, Jingjing ;

Ngo, Chong-Wah .

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :32-41

[10] Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention [J].

Chen, Jingyuan ;

Zhang, Hanwang ;

He, Xiangnan ;

Nie, Liqiang ;

Liu, Wei ;

Chua, Tat-Seng .

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :335-344

← 1 2 3 4 5 6 →