Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

被引：1

作者：

Zou, Zhuoyang ^{[1
]}

Zhu, Xinghui ^{[1
]}

Zhu, Qinying ^{[1
]}

Zhang, Hongyan ^{[1
]}

Zhu, Lei ^{[1
]}

机构：

[1] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China

来源：

FOODS | 2024年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning; TRANSFORMER;

D O I：

10.3390/foods13111628

中图分类号：

TS2 [食品工业];

学科分类号：

0832 ;

摘要：

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.

引用

页数：16

共 54 条

[1] Cross-modal recipe retrieval via parallel- and cross-attention networks learning [J].

Cao, Da ;

Chu, Jingjing ;

Zhu, Ningbo ;

Nie, Liqiang .

KNOWLEDGE-BASED SYSTEMS, 2020, 193

[2] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [J].

Carvalho, Micael ;

Cadene, Remi ;

Picard, David ;

Soulier, Laure ;

Thome, Nicolas ;

Cord, Matthieu .

ACM/SIGIR PROCEEDINGS 2018, 2018, :35-44

[3] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval [J].

Chen, Jing-Jing ;

Ngo, Chong-Wah ;

Feng, Fu-Li ;

Chua, Tat-Seng .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :1020-1028

[4] Multimodal Encoders for Food-Oriented Cross-Modal Retrieval [J].

Chen, Ying ;

Zhou, Dong ;

Li, Lin ;

Han, Jun-mei .

WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 :253-266

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[7]

Fu H., 2020, P IEEE CVF C COMP VI, P14570

[8] Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning [J].

Guerrero, Ricardo ;

Pham, Hai X. ;

Pavlovic, Vladimir .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :3192-3201

[9]

Guo GD, 2003, LECT NOTES COMPUT SC, V2888, P986

[10] Fast Nondestructive Detection Technology and Equipment for Food Quality and Safety [J].

Guo, Zhiming ;

Jayan, Heera .

FOODS, 2023, 12 (20)

← 1 2 3 4 5 6 →