Learning sufficient scene representation for unsupervised cross-modal retrieval

被引:6
|
作者
Luo, Jieting [1 ]
Wo, Yan [1 ]
Wu, Bicheng [1 ]
Han, Guoqiang [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
关键词
Unsupervised cross-modal retrieval; Common representation; Statistical manifold; Gaussian Mixture Model; Geodesic distance;
D O I
10.1016/j.neucom.2021.07.078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel unsupervised Cross-Modal retrieval method via Sufficient Scene Representation (CMSSR) is proposed. Distinguished from the existing methods which mainly focus on simultaneously preserving the mutually-constrained intra-and inter-modal similarity relation, CMSSR considers data of different modalities as the descriptions of a scene from different views and accordingly integrates information of different modalities to learn a complete common representation containing sufficient information of the corresponding scene. To obtain such common representation, Gaussian Mixture Model (GMM) is firstly utilized to generate statistic representation of each uni-modal data, while the uni-modal spaces are accordingly abstracted as uni-modal statistical manifolds. In addition, the common space is assumed to be a high-dimensional statistical manifold with different uni-modal statistical man-ifolds as its sub-manifolds. In order to generate sufficient scene representation from uni-modal data, a representation completion strategy based on logistic regression is proposed to effectively complete the missing representation of another modality. Then, the similarity between different multi-modal data can be more accurately reflected by the distance metric in common statistical manifold. Based on the dis-tance metric in common statistical manifold, Iterative Quantization is utilized to further generate binary code for fast cross-modal retrieval. Extensive experiments on three standard benchmark datasets fully demonstrate the superiority of CMSSR compared with several state-of-the-art methods. (c) 2021 Published by Elsevier B.V.
引用
收藏
页码:404 / 418
页数:15
相关论文
共 50 条
  • [31] Revising similarity relationship hashing for unsupervised cross-modal retrieval
    Wu, You
    Li, Bo
    Li, Zhixin
    NEUROCOMPUTING, 2025, 614
  • [32] Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Wang, Lei
    Xie, De
    Liu, Xianglong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 176 - 183
  • [33] ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
    Cheng, Mengjun
    Sun, Yipeng
    Wang, Longchao
    Zhu, Xiongwei
    Yao, Kun
    Chen, Jie
    Song, Guoli
    Han, Junyu
    Liu, Jingtuo
    Ding, Errui
    Wang, Jingdong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5174 - 5183
  • [34] StacMR: Scene-Text Aware Cross-Modal Retrieval
    Mafla, Andres
    Rezende, Rafael S.
    Gomez, Lluis
    Larlus, Diane
    Karatzas, Dimosthenis
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2219 - 2229
  • [35] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [36] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [37] Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval
    Meng, Hui
    Zhang, Huaxiang
    Liu, Li
    Liu, Dongmei
    Lu, Xu
    Guo, Xinru
    NEUROCOMPUTING, 2024, 595
  • [38] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [39] Federated learning for supervised cross-modal retrieval
    Li, Ang
    Li, Yawen
    Shao, Yingxia
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (04):
  • [40] Quaternion Representation Learning for cross-modal matching
    Wang, Zheng
    Xu, Xing
    Wei, Jiwei
    Xie, Ning
    Shao, Jie
    Yang, Yang
    KNOWLEDGE-BASED SYSTEMS, 2023, 270