MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

被引:8
作者
Zhu, Lei [1 ]
Cai, Liewu [1 ]
Song, Jiayu [2 ]
Zhu, Xinghui [1 ]
Zhang, Chengyuan [1 ]
Zhang, Shichao [2 ]
机构
[1] Hunan Agr Univ, Coll Informat & Intelligence, Changsha, Hunan, Peoples R China
[2] Cent South Univ, Sch Comp Sci & Engn, Changsha, Hunan, Peoples R China
来源
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Cross-modal hashing; Semantic correlation; Quantization; Deep learning; NETWORK;
D O I
10.1145/3512527.3531417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal hashing is a hot issue in the multimedia community, which is to generate compact hash code from multimedia content for efficient cross-modal search. Two challenges, i.e., (1) How to efficiently enhance cross-modal semantic mining is essential for cross-modal hash code learning, and (2) How to combine multiple semantic correlations learning to improve the semantic similarity preserving, cannot be ignored. To this end, this paper proposed a novel end-to-end cross-modal hashing approach, named Multiple Semantic Structure-Preserving Quantization (MSSPQ) that is to integrate deep hashing model with multiple semantic correlation learning to boost hash learning performance. The multiple semantic correlation learning consists of inter-modal and intra-modal pairwise correlation learning and Cosine correlation learning, which can comprehensively capture cross-modal consistent semantics and realize semantic similarity preserving. Extensive experiments are conducted on three multimedia datasets, which confirms that the proposed method outperforms the baselines.
引用
收藏
页码:631 / 638
页数:8
相关论文
共 51 条
  • [1] [Anonymous], 2016, P 25 INT JOINT C ART
  • [2] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [3] Cao Y, 2017, Arxiv, DOI arXiv:1602.06697
  • [4] Cross-Modal Hamming Hashing
    Cao, Yue
    Liu, Bin
    Long, Mingsheng
    Wang, Jianmin
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 207 - 223
  • [5] Bayesian feature interaction selection for factorization machines
    Chen, Yifan
    Wang, Yang
    Ren, Pengjie
    Wang, Meng
    de Rijke, Maarten
    [J]. ARTIFICIAL INTELLIGENCE, 2022, 302
  • [6] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
    Deng, Cheng
    Chen, Zhaojia
    Liu, Xianglong
    Gao, Xinbo
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3893 - 3903
  • [7] Detecting Disaster-Related Tweets Via Multimodal Adversarial Neural Network
    Gao, Wang
    Zhu, Xun
    Wang, Yuwei
    Li, Lin
    [J]. IEEE MULTIMEDIA, 2020, 27 (04) : 28 - 37
  • [8] A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics
    Gong, Yunchao
    Ke, Qifa
    Isard, Michael
    Lazebnik, Svetlana
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) : 210 - 233
  • [9] Guo CF, 2021, Arxiv, DOI arXiv:1907.01693
  • [10] Han J., 2020, IEEE transactions on cybernetics, P1