Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

被引：2

作者：

Han, Zhichao ^{[1
]}

Azman, Azreen Bin ^{[1
]}

Mustaffa, Mas Rina Binti ^{[1
]}

Khalid, Fatimah Binti ^{[1
]}

机构：

[1] Univ Putra Malaysia, Fac Comp Sci & Informat Technol, Serdang 43400, Malaysia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Correlation; Semantics; Task analysis; Reviews; Benchmark testing; Representation learning; Deep learning; Information retrieval; Artificial intelligence; Cross modal retrieval; Cross-modal retrieval; deep learning; review; RECONSTRUCTION; NETWORKS;

D O I：

10.1109/ACCESS.2024.3444817

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressing need for cross-modal retrieval technology. Its purpose is to mine the connection between different modal samples, that is, to retrieve another modal sample with approximate semantics through one modal sample. For example, users can retrieve multimedia data such as images or videos with text. However, there are differences in the modal representation of different types of multimedia data, and measuring the correlation between different modes is the main problem of cross-modal retrieval. Currently, the most popular deep learning methods have achieved remarkable results in the field of data processing and graphics. Many researchers have applied deep learning methods to cross-modal retrieval to solve the problem of similarity measurement between different multimedia data. By summarizing the relevant paper methods of cross-modal retrieval, this paper provides a definition of cross-modal retrieval problems, reviews the core ideas of the current mainstream cross-modal retrieval methods in the form of three main methods, lists the commonly used data sets and evaluation methods, and finally analyzes the problems and future research trends of cross-modal retrieval.

引用

页码：115716 / 115741

页数：26

共 136 条

[1] Al-Halah Z, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P48
[2] Andrew G., 2013, P 30 INT C MACH LEAR, P1247
[3] [Anonymous], 2009, P ACM INT C IMAGE VI
[4] Learning to rank with (a lot of) word features
Bai, Bing
Weston, Jason
Grangier, David
Collobert, Ronan
Sadamasa, Kunihiko
Qi, Yanjun
Chapelle, Olivier
Weinberger, Kilian
[J]. INFORMATION RETRIEVAL, 2010, 13 (03): : 291 - 314
[5] Bethge M, 2019, Arxiv, DOI [arXiv:1904.00760, DOI 10.48550/ARXIV.1904.00760]
[6] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[7] Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[8] Burger W., 2022, Digital Image Processing: An Algorithmic Introduction, P709
[9] Chen ZD, 2018, AAAI CONF ARTIF INTE, P274
[10] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
Cheng, Miaomiao
Jing, Liping
Ng, Michael K.
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)

← 1 2 3 4 5 6 7 8 9 10 →