Learning sufficient scene representation for unsupervised cross-modal retrieval

被引:6
|
作者
Luo, Jieting [1 ]
Wo, Yan [1 ]
Wu, Bicheng [1 ]
Han, Guoqiang [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
关键词
Unsupervised cross-modal retrieval; Common representation; Statistical manifold; Gaussian Mixture Model; Geodesic distance;
D O I
10.1016/j.neucom.2021.07.078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel unsupervised Cross-Modal retrieval method via Sufficient Scene Representation (CMSSR) is proposed. Distinguished from the existing methods which mainly focus on simultaneously preserving the mutually-constrained intra-and inter-modal similarity relation, CMSSR considers data of different modalities as the descriptions of a scene from different views and accordingly integrates information of different modalities to learn a complete common representation containing sufficient information of the corresponding scene. To obtain such common representation, Gaussian Mixture Model (GMM) is firstly utilized to generate statistic representation of each uni-modal data, while the uni-modal spaces are accordingly abstracted as uni-modal statistical manifolds. In addition, the common space is assumed to be a high-dimensional statistical manifold with different uni-modal statistical man-ifolds as its sub-manifolds. In order to generate sufficient scene representation from uni-modal data, a representation completion strategy based on logistic regression is proposed to effectively complete the missing representation of another modality. Then, the similarity between different multi-modal data can be more accurately reflected by the distance metric in common statistical manifold. Based on the dis-tance metric in common statistical manifold, Iterative Quantization is utilized to further generate binary code for fast cross-modal retrieval. Extensive experiments on three standard benchmark datasets fully demonstrate the superiority of CMSSR compared with several state-of-the-art methods. (c) 2021 Published by Elsevier B.V.
引用
收藏
页码:404 / 418
页数:15
相关论文
共 50 条
  • [41] A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
    Wang, Suping
    Zhu, Ligu
    Shi, Lei
    Mo, Hao
    Tan, Songfu
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [42] Natural Language-Based Vehicle Retrieval with Explicit Cross-Modal Representation Learning
    Xu, Bocheng
    Xiong, Yihua
    Zhang, Rui
    Feng, Yanyi
    Wu, Haifeng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3141 - 3148
  • [43] Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval
    Cheng, Qingrong
    Gu, Xiaodong
    NEURAL NETWORKS, 2021, 134 : 143 - 162
  • [44] Cross-Modal Scene Networks
    Aytar, Yusuf
    Castrejon, Lluis
    Vondrick, Carl
    Pirsiavash, Hamed
    Torralba, Antonio
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2303 - 2314
  • [45] Joint Coupled-Hashing Representation for Cross-Modal Retrieval
    Liu, Yihan
    Chen, Zhaojia
    Deng, Cheng
    Gao, Xinbo
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 35 - 38
  • [46] Dark knowledge association guided hashing for unsupervised cross-modal retrieval
    Kang, Han
    Zhang, Xiaowei
    Han, Wenpeng
    Zhou, Mingliang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [47] Unsupervised Cross-modal Hash Retrieval Fusing Multiple Instance Relations
    Li Z.-X.
    Hou C.-W.
    Xie X.-M.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (11): : 4973 - 4988
  • [48] High-order nonlocal Hashing for unsupervised cross-modal retrieval
    Peng-Fei Zhang
    Yadan Luo
    Zi Huang
    Xin-Shun Xu
    Jingkuan Song
    World Wide Web, 2021, 24 : 563 - 583
  • [49] Unsupervised Cross-Modal Medical Image Retrieval with Ensemble Prototype Alignment
    Yao, Yishan
    Liu, Xiaoqing
    Yu, Zhiwen
    Lv, Jianming
    Hu, Yang
    Yang, Kaixiang
    2024 IEEE INTERNATIONAL CONFERENCE ON MEDICAL ARTIFICIAL INTELLIGENCE, MEDAI 2024, 2024, : 161 - 167
  • [50] Cross-Modal Retriever: Unsupervised Image Retrieval with Text and Reference Images
    Desai, Padmashree
    Kumar, Vivek
    Srivastava, Chandan
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,