Learning sufficient scene representation for unsupervised cross-modal retrieval

被引:6
|
作者
Luo, Jieting [1 ]
Wo, Yan [1 ]
Wu, Bicheng [1 ]
Han, Guoqiang [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
关键词
Unsupervised cross-modal retrieval; Common representation; Statistical manifold; Gaussian Mixture Model; Geodesic distance;
D O I
10.1016/j.neucom.2021.07.078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel unsupervised Cross-Modal retrieval method via Sufficient Scene Representation (CMSSR) is proposed. Distinguished from the existing methods which mainly focus on simultaneously preserving the mutually-constrained intra-and inter-modal similarity relation, CMSSR considers data of different modalities as the descriptions of a scene from different views and accordingly integrates information of different modalities to learn a complete common representation containing sufficient information of the corresponding scene. To obtain such common representation, Gaussian Mixture Model (GMM) is firstly utilized to generate statistic representation of each uni-modal data, while the uni-modal spaces are accordingly abstracted as uni-modal statistical manifolds. In addition, the common space is assumed to be a high-dimensional statistical manifold with different uni-modal statistical man-ifolds as its sub-manifolds. In order to generate sufficient scene representation from uni-modal data, a representation completion strategy based on logistic regression is proposed to effectively complete the missing representation of another modality. Then, the similarity between different multi-modal data can be more accurately reflected by the distance metric in common statistical manifold. Based on the dis-tance metric in common statistical manifold, Iterative Quantization is utilized to further generate binary code for fast cross-modal retrieval. Extensive experiments on three standard benchmark datasets fully demonstrate the superiority of CMSSR compared with several state-of-the-art methods. (c) 2021 Published by Elsevier B.V.
引用
收藏
页码:404 / 418
页数:15
相关论文
共 50 条
  • [11] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [12] Cross-Modal Guided Visual Representation Learning for Social Image Retrieval
    Guan, Ziyu
    Zhao, Wanqing
    Liu, Hongmin
    Nakashima, Yuta
    Noboru, Babaguchi
    He, Xiaofei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 2186 - 2198
  • [13] Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval
    Xu, Mengying
    Luo, Linyin
    Lai, Hanjiang
    Yin, Jian
    DATA SCIENCE AND ENGINEERING, 2024, 9 (03) : 251 - 263
  • [14] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
    Cheng, Miaomiao
    Jing, Liping
    Ng, Michael K.
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [15] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [16] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [17] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [18] Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text
    Schindler, Alexander
    Gordea, Sergiu
    Knees, Peter
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 706 - 713
  • [19] Cross-Modal Discrete Representation Learning
    Liu, Alexander H.
    Jin, SouYoung
    Lai, Cheng-I Jeff
    Rouditchenko, Andrew
    Oliva, Aude
    Glass, James
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
  • [20] Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
    Zou, Hui
    Du, Ji-Xiang
    Zhai, Chuan-Min
    Wang, Jing
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 322 - 331