Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval

被引：36

作者：

Zhang, Donglin ^{[1
]}

Wu, Xiao-Jun ^{[1
]}

Yu, Jun ^{[1
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, 1800 Lihu Ave, Wuxi 214122, Jiangsu, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Hashing; cross-modal retrieval; flexible matrix factorization;

D O I：

10.1145/3446774

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hashing methods have sparked a great revolution on large-scale cross-media search due to its effectiveness and efficiency. Most existing approaches learn unified hash representation in a common Hamming space to represent all multimodal data. However, the unified hash codes may not characterize the cross-modal data discriminatively, because the data may vary greatly due to its different dimensionalities, physical properties, and statistical information. In addition, most existing supervised cross-modal algorithms preserve the similarity relationship by constructing an n x n pairwise similarity matrix, which requires a large amount of calculation and loses the category information. To mitigate these issues, a novel cross-media hashing approach is proposed in this article, dubbed label flexible matrix factorization hashing (LFMH). Specifically, LFMH jointly learns the modality-specific latent subspace with similar semantic by the flexible matrix factorization. In addition, LFMH guides the hash learning by utilizing the semantic labels directly instead of the large n x n pairwise similarity matrix. LFMH transforms the heterogeneous data into modality-specific latent semantic representation. Therefore, we can obtain the hash codes by quantifying the representations, and the learned hash codes are consistent with the supervised labels of multimodal data. Then, we can obtain the similar binary codes of the corresponding modality, and the binary codes can characterize such samples flexibly. Accordingly, the derived hash codes have more discriminative power for single-modal and cross-modal retrieval tasks. Extensive experiments on eight different databases demonstrate that our model outperforms some competitive approaches.

引用

页数：18

共 45 条

[1] Baeza-Yates R., 1999, Modern information retrieval, V463
[2] Deep Visual-Semantic Hashing for Cross-Modal Retrieval
Cao, Yue
Long, Mingsheng
Wang, Jianmin
Yang, Qiang
Yu, Philip S.
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1445 - 1454
[3] Chua TS, 2009, P ACM INT C IM VID R, P1, DOI 10.1145/1646396.1646452
[4] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[5] Collective Matrix Factorization Hashing for Multimodal Data
Ding, Guiguang
Guo, Yuchen
Zhou, Jile
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2083 - 2090
[6] The Pascal Visual Object Classes (VOC) Challenge
Everingham, Mark
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
[7] FEIWANG PC, 2012, INFORM RETRIEVAL, V15, P179
[8] A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics
Gong, Yunchao
Ke, Qifa
Isard, Michael
Lazebnik, Svetlana
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) : 210 - 233
[9] Collective Reconstructive Embeddings for Cross-Modal Hashing
Hu, Mengqiu
Yang, Yang
Shen, Fumin
Xie, Ning
Hong, Richang
Shen, Heng Tao
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2770 - 2784
[10] Huiskes MJ, 2008, P 1 ACM INT C MULT I, P39, DOI [10.1145/1460096.1460104, DOI 10.1145/1460096.1460104]

← 1 2 3 4 5 →