Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

被引:1
作者
Zou, Zhuoyang [1 ]
Zhu, Xinghui [1 ]
Zhu, Qinying [1 ]
Zhang, Hongyan [1 ]
Zhu, Lei [1 ]
机构
[1] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning; TRANSFORMER;
D O I
10.3390/foods13111628
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.
引用
收藏
页数:16
相关论文
共 54 条
[1]   Cross-modal recipe retrieval via parallel- and cross-attention networks learning [J].
Cao, Da ;
Chu, Jingjing ;
Zhu, Ningbo ;
Nie, Liqiang .
KNOWLEDGE-BASED SYSTEMS, 2020, 193
[2]   Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [J].
Carvalho, Micael ;
Cadene, Remi ;
Picard, David ;
Soulier, Laure ;
Thome, Nicolas ;
Cord, Matthieu .
ACM/SIGIR PROCEEDINGS 2018, 2018, :35-44
[3]   Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval [J].
Chen, Jing-Jing ;
Ngo, Chong-Wah ;
Feng, Fu-Li ;
Chua, Tat-Seng .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :1020-1028
[4]   Multimodal Encoders for Food-Oriented Cross-Modal Retrieval [J].
Chen, Ying ;
Zhou, Dong ;
Li, Lin ;
Han, Jun-mei .
WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 :253-266
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[7]  
Fu H., 2020, P IEEE CVF C COMP VI, P14570
[8]   Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning [J].
Guerrero, Ricardo ;
Pham, Hai X. ;
Pavlovic, Vladimir .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :3192-3201
[9]  
Guo GD, 2003, LECT NOTES COMPUT SC, V2888, P986
[10]   Fast Nondestructive Detection Technology and Equipment for Food Quality and Safety [J].
Guo, Zhiming ;
Jayan, Heera .
FOODS, 2023, 12 (20)