Learning sufficient scene representation for unsupervised cross-modal retrieval

被引：6

作者：

Luo, Jieting ^{[1
]}

Wo, Yan ^{[1
]}

Wu, Bicheng ^{[1
]}

Han, Guoqiang ^{[1
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 461卷 / 461期

关键词：

Unsupervised cross-modal retrieval; Common representation; Statistical manifold; Gaussian Mixture Model; Geodesic distance;

D O I：

10.1016/j.neucom.2021.07.078

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel unsupervised Cross-Modal retrieval method via Sufficient Scene Representation (CMSSR) is proposed. Distinguished from the existing methods which mainly focus on simultaneously preserving the mutually-constrained intra-and inter-modal similarity relation, CMSSR considers data of different modalities as the descriptions of a scene from different views and accordingly integrates information of different modalities to learn a complete common representation containing sufficient information of the corresponding scene. To obtain such common representation, Gaussian Mixture Model (GMM) is firstly utilized to generate statistic representation of each uni-modal data, while the uni-modal spaces are accordingly abstracted as uni-modal statistical manifolds. In addition, the common space is assumed to be a high-dimensional statistical manifold with different uni-modal statistical man-ifolds as its sub-manifolds. In order to generate sufficient scene representation from uni-modal data, a representation completion strategy based on logistic regression is proposed to effectively complete the missing representation of another modality. Then, the similarity between different multi-modal data can be more accurately reflected by the distance metric in common statistical manifold. Based on the dis-tance metric in common statistical manifold, Iterative Quantization is utilized to further generate binary code for fast cross-modal retrieval. Extensive experiments on three standard benchmark datasets fully demonstrate the superiority of CMSSR compared with several state-of-the-art methods. (c) 2021 Published by Elsevier B.V.

引用

页码：404 / 418

页数：15

共 50 条

[31] Revising similarity relationship hashing for unsupervised cross-modal retrieval
Wu, You
Li, Bo
Li, Zhixin
NEUROCOMPUTING, 2025, 614
[32] Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval
Li, Chao
Deng, Cheng
Wang, Lei
Xie, De
Liu, Xianglong
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 176 - 183
[33] ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
Cheng, Mengjun
Sun, Yipeng
Wang, Longchao
Zhu, Xiongwei
Yao, Kun
Chen, Jie
Song, Guoli
Han, Junyu
Liu, Jingtuo
Ding, Errui
Wang, Jingdong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5174 - 5183
[34] StacMR: Scene-Text Aware Cross-Modal Retrieval
Mafla, Andres
Rezende, Rafael S.
Gomez, Lluis
Larlus, Diane
Karatzas, Dimosthenis
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2219 - 2229
[35] Cross-Modal Retrieval Using Deep Learning
Malik, Shaily
Bhardwaj, Nikhil
Bhardwaj, Rahul
Kumar, Saurabh
PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
[36] Learning Cross-Modal Retrieval with Noisy Labels
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Zhen, Liangli
Lin, Jie
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
[37] Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval
Meng, Hui
Zhang, Huaxiang
Liu, Li
Liu, Dongmei
Lu, Xu
Guo, Xinru
NEUROCOMPUTING, 2024, 595
[38] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
[39] Federated learning for supervised cross-modal retrieval
Li, Ang
Li, Yawen
Shao, Yingxia
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (04):
[40] Quaternion Representation Learning for cross-modal matching
Wang, Zheng
Xu, Xing
Wei, Jiwei
Xie, Ning
Shao, Jie
Yang, Yang
KNOWLEDGE-BASED SYSTEMS, 2023, 270

← 1 2 3 4 5 →