Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

被引:3
作者
Zhang, Hong [1 ,2 ]
Pan, Min [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430081, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
关键词
Cross-modal retrieval; Multi-scale fusion; Hash learning; Semantics preserving; Deep learning;
D O I
10.1007/s11042-020-09869-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on hash-based cross-modal retrieval has been a hotspot in the field of content-based multimedia retrieval research. Most deep cross-modal hashing methods only consider inter-modal loss that can remain local information of training data, and ignore the loss within data samples of the same modality that can remain the global information of dataset. In addition, they also ignore the factor that different scales of single modal data contain different semantic information, which affects the representation of data features. In this paper, we propose a semantics-preserving hashing method based on multi-scale fusion. More concretely, a multi-scale fusion pooling model is proposed for both image feature training network and text feature training network. Therefore, we can extract the multi-scale features of image dataset and solve the sparsity problem of text BOW vectors. When constructing the loss function, we consider intra-modal loss while considering inter-modal loss. Therefore, the output hash code retains both global and local underlying semantic correlation when image and text feature training network are trained. Experiment results on NUS-WIDE and MIRFlickr-25 K prove that against other existing methods, our algorithm improves cross-modal retrieval accuracy.
引用
收藏
页码:17299 / 17314
页数:16
相关论文
共 35 条
  • [1] [Anonymous], 2017, P AAAI C ART INT, DOI [DOI 10.1609/AAAI.V31I1.10719, 10.1609/aaai.v31i1.10719]
  • [2] Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
  • [3] Chua T.-S., 2009, ACM INT C IMAGE VIDE, P1
  • [4] Han YH, 2012, PROC CVPR IEEE, P2981, DOI 10.1109/CVPR.2012.6248027
  • [5] He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
  • [6] A New Benchmark and Approach for Fine-grained Cross-media Retrieval
    He, Xiangteng
    Peng, Yuxin
    Xie, Liu
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1740 - 1748
  • [7] Huiskes MJ, 2008, P 1 ACM INT C MULT I, P39, DOI [10.1145/1460096.1460104, DOI 10.1145/1460096.1460104]
  • [8] Deep Cross-Modal Hashing
    Jiang, Qing-Yuan
    Li, Wu-Jun
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3270 - 3278
  • [9] Kumar S., 2011, IJCAI, P1360
  • [10] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Li, Ning
    Liu, Wei
    Gao, Xinbo
    Tao, Dacheng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251