Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

被引:2
作者
Han, Zhichao [1 ]
Azman, Azreen Bin [1 ]
Mustaffa, Mas Rina Binti [1 ]
Khalid, Fatimah Binti [1 ]
机构
[1] Univ Putra Malaysia, Fac Comp Sci & Informat Technol, Serdang 43400, Malaysia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Correlation; Semantics; Task analysis; Reviews; Benchmark testing; Representation learning; Deep learning; Information retrieval; Artificial intelligence; Cross modal retrieval; Cross-modal retrieval; deep learning; review; RECONSTRUCTION; NETWORKS;
D O I
10.1109/ACCESS.2024.3444817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressing need for cross-modal retrieval technology. Its purpose is to mine the connection between different modal samples, that is, to retrieve another modal sample with approximate semantics through one modal sample. For example, users can retrieve multimedia data such as images or videos with text. However, there are differences in the modal representation of different types of multimedia data, and measuring the correlation between different modes is the main problem of cross-modal retrieval. Currently, the most popular deep learning methods have achieved remarkable results in the field of data processing and graphics. Many researchers have applied deep learning methods to cross-modal retrieval to solve the problem of similarity measurement between different multimedia data. By summarizing the relevant paper methods of cross-modal retrieval, this paper provides a definition of cross-modal retrieval problems, reviews the core ideas of the current mainstream cross-modal retrieval methods in the form of three main methods, lists the commonly used data sets and evaluation methods, and finally analyzes the problems and future research trends of cross-modal retrieval.
引用
收藏
页码:115716 / 115741
页数:26
相关论文
共 136 条
  • [21] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [22] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [23] A discriminative kernel-based model to rank images from text queries
    Grangier, David
    Bengio, Samy
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1371 - 1384
  • [24] Multi-Granularity Semantic Information Integration Graph for Cross-Modal Hash Retrieval
    Han, Zhichao
    Bin Azman, Azreen
    Khalid, Fatimah Binti
    Mustaffa, Mas Rina Binti
    [J]. IEEE ACCESS, 2024, 12 : 44682 - 44694
  • [25] Canonical correlation analysis: An overview with application to learning methods
    Hardoon, DR
    Szedmak, S
    Shawe-Taylor, J
    [J]. NEURAL COMPUTATION, 2004, 16 (12) : 2639 - 2664
  • [26] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [27] He L, 2017, IEEE INT CON MULTI, P1153, DOI 10.1109/ICME.2017.8019549
  • [28] Category Alignment Adversarial Learning for Cross-Modal Retrieval
    He, Shiyuan
    Wang, Weiyang
    Wang, Zheng
    Xu, Xing
    Yang, Yang
    Wang, Xiaoming
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
  • [29] Cross-Modal Retrieval via Deep and Bidirectional Representation Learning
    He, Yonghao
    Xiang, Shiming
    Kang, Cuicui
    Wang, Jian
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (07) : 1363 - 1377
  • [30] Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing
    Hoang, Tuan
    Do, Thanh-Toan
    Nguyen, Tam V.
    Cheung, Ngai-Man
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6289 - 6302