Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval

被引:100
作者
Zhang, Peng-Fei [1 ]
Li, Yang [1 ]
Huang, Zi [1 ]
Xu, Xin-Shun [2 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
[2] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
基金
澳大利亚研究理事会;
关键词
Semantics; Convolutional codes; Binary codes; Convolution; Measurement; Feature extraction; Sparse matrices; Multimodal; unsupervised hashing; cross-modal search; graph convolutional networks; BINARY-CODES; ROBUST;
D O I
10.1109/TMM.2021.3053766
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal hashing has sparked much attention in large-scale information retrieval for its storage and query efficiency. Despite the great success achieved by supervised approaches, existing unsupervised hashing methods still suffer from the lack of reliable learning guidance and cross-modal discrepancy. In this paper, we propose Aggregation-based Graph Convolutional Hashing (AGCH) to tackle these obstacles. First, considering that a single similarity metric can hardly represent data relationships comprehensively, we develop an efficient aggregation strategy that utilises multiple metrics to construct a more precise affinity matrix for learning. Specifically, we apply various similarity measures to exploit the structural information of multiple modalities from different perspectives and then aggregate the obtained information to produce a joint similarity matrix. Furthermore, a novel deep model is designed to learn unified binary codes across different modalities, where the key components include modality-specific encoders, Graph Convolutional Networks (GCNs) and a fusion module. The modality-specific encoders are tasked to learn feature embeddings for each individual modality. On this basis, we leverage GCNs to further excavate the semantic structure of data, along with a fusion module to correlate different modalities. Extensive experiments on three real-world datasets demonstrate that the proposed method significantly outperforms the state-of-the-art competitors.
引用
收藏
页码:466 / 479
页数:14
相关论文
共 73 条
  • [1] [Anonymous], 2012, NIPS
  • [2] [Anonymous], 2012, P 26 ANN C NEUR INF
  • [3] Deep Visual-Semantic Hashing for Cross-Modal Retrieval
    Cao, Yue
    Long, Mingsheng
    Wang, Jianmin
    Yang, Qiang
    Yu, Philip S.
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1445 - 1454
  • [4] Chua T., 2009, PROC INT C IMAGE VID, P1, DOI DOI 10.1145/1646396.1646452
  • [5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [6] Collective Matrix Factorization Hashing for Multimodal Data
    Ding, Guiguang
    Guo, Yuchen
    Zhou, Jile
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2083 - 2090
  • [7] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [8] Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval
    Gong, Yunchao
    Lazebnik, Svetlana
    Gordo, Albert
    Perronnin, Florent
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (12) : 2916 - 2929
  • [9] Hamilton WL, 2017, ADV NEUR IN, V30
  • [10] Deep Binary Reconstruction for Cross-Modal Hashing
    Hu, Di
    Nie, Feiping
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 973 - 985