Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

被引：3

作者：

Zhang, Hong ^{[1
,2
]}

Pan, Min ^{[1
,2
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430081, Peoples R China

[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 11期

关键词：

Cross-modal retrieval; Multi-scale fusion; Hash learning; Semantics preserving; Deep learning;

D O I：

10.1007/s11042-020-09869-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research on hash-based cross-modal retrieval has been a hotspot in the field of content-based multimedia retrieval research. Most deep cross-modal hashing methods only consider inter-modal loss that can remain local information of training data, and ignore the loss within data samples of the same modality that can remain the global information of dataset. In addition, they also ignore the factor that different scales of single modal data contain different semantic information, which affects the representation of data features. In this paper, we propose a semantics-preserving hashing method based on multi-scale fusion. More concretely, a multi-scale fusion pooling model is proposed for both image feature training network and text feature training network. Therefore, we can extract the multi-scale features of image dataset and solve the sparsity problem of text BOW vectors. When constructing the loss function, we consider intra-modal loss while considering inter-modal loss. Therefore, the output hash code retains both global and local underlying semantic correlation when image and text feature training network are trained. Experiment results on NUS-WIDE and MIRFlickr-25 K prove that against other existing methods, our algorithm improves cross-modal retrieval accuracy.

引用

页码：17299 / 17314

页数：16

共 35 条

[1] [Anonymous], 2017, P AAAI C ART INT, DOI [DOI 10.1609/AAAI.V31I1.10719, 10.1609/aaai.v31i1.10719]
[2] Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[3] Chua T.-S., 2009, ACM INT C IMAGE VIDE, P1
[4] Han YH, 2012, PROC CVPR IEEE, P2981, DOI 10.1109/CVPR.2012.6248027
[5] He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
[6] A New Benchmark and Approach for Fine-grained Cross-media Retrieval
He, Xiangteng
Peng, Yuxin
Xie, Liu
[J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1740 - 1748
[7] Huiskes MJ, 2008, P 1 ACM INT C MULT I, P39, DOI [10.1145/1460096.1460104, DOI 10.1145/1460096.1460104]
[8] Deep Cross-Modal Hashing
Jiang, Qing-Yuan
Li, Wu-Jun
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3270 - 3278
[9] Kumar S., 2011, IJCAI, P1360
[10] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
Li, Chao
Deng, Cheng
Li, Ning
Liu, Wei
Gao, Xinbo
Tao, Dacheng
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251

← 1 2 3 4 →