Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval

被引：122

作者：

Zhang, Peng-Fei ^{[1
]}

Li, Yang ^{[1
]}

Huang, Zi ^{[1
]}

Xu, Xin-Shun ^{[2
]}

机构：

[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia

[2] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

澳大利亚研究理事会;

关键词：

Semantics; Convolutional codes; Binary codes; Convolution; Measurement; Feature extraction; Sparse matrices; Multimodal; unsupervised hashing; cross-modal search; graph convolutional networks; BINARY-CODES; ROBUST;

D O I：

10.1109/TMM.2021.3053766

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal hashing has sparked much attention in large-scale information retrieval for its storage and query efficiency. Despite the great success achieved by supervised approaches, existing unsupervised hashing methods still suffer from the lack of reliable learning guidance and cross-modal discrepancy. In this paper, we propose Aggregation-based Graph Convolutional Hashing (AGCH) to tackle these obstacles. First, considering that a single similarity metric can hardly represent data relationships comprehensively, we develop an efficient aggregation strategy that utilises multiple metrics to construct a more precise affinity matrix for learning. Specifically, we apply various similarity measures to exploit the structural information of multiple modalities from different perspectives and then aggregate the obtained information to produce a joint similarity matrix. Furthermore, a novel deep model is designed to learn unified binary codes across different modalities, where the key components include modality-specific encoders, Graph Convolutional Networks (GCNs) and a fusion module. The modality-specific encoders are tasked to learn feature embeddings for each individual modality. On this basis, we leverage GCNs to further excavate the semantic structure of data, along with a fusion module to correlate different modalities. Extensive experiments on three real-world datasets demonstrate that the proposed method significantly outperforms the state-of-the-art competitors.

引用

页码：466 / 479

页数：14

共 73 条

[1]

[Anonymous], 2013, P ACM INT C MULT, DOI DOI 10.1145/2502081.2502107

[2]

[Anonymous], 2012, P INT C NEUR INF PRO

[3]

[Anonymous], 2011, P INT JOINT C ART IN, DOI DOI 10.5591/978-1-57735-516-8/IJCAI11-23

[4]

[Anonymous], 2008, P 1 ACM INT C MULT I

[5] Deep Visual-Semantic Hashing for Cross-Modal Retrieval [J].

Cao, Yue ;

Long, Mingsheng ;

Wang, Jianmin ;

Yang, Qiang ;

Yu, Philip S. .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1445-1454

[6]

Chua T.-S., 2009, P ACM INT C IM VID R

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Collective Matrix Factorization Hashing for Multimodal Data [J].

Ding, Guiguang ;

Guo, Yuchen ;

Zhou, Jile .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090

[9] Cross-modal Retrieval with Correspondence Autoencoder [J].

Feng, Fangxiang ;

Wang, Xiaojie ;

Li, Ruifan .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :7-16

[10]

GONG Y, 2012, ADV NEURAL INFORM PR, P1196

← 1 2 3 4 5 6 7 8 →