HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

被引:36
作者
Zhang, Chengyuan [1 ]
Song, Jiayu [2 ]
Zhu, Xiaofeng [3 ]
Zhu, Lei [4 ]
Zhang, Shichao [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Sichuan, Peoples R China
[4] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; deep learning; intra-modal semantic correlation; hybrid cross-modal similarity;
D O I
10.1145/3412847
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.
引用
收藏
页数:22
相关论文
共 68 条
  • [1] Andrienko G., 2013, Introduction, P1
  • [2] [Anonymous], 2016, P 25 INT JOINT C ART
  • [3] [Anonymous], 2011, Advances in Neural Information Processing Systems
  • [4] [Anonymous], 2009, CIVR
  • [5] Cross-modal recipe retrieval via parallel- and cross-attention networks learning
    Cao, Da
    Chu, Jingjing
    Zhu, Ningbo
    Nie, Liqiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [6] Video-based recipe retrieval
    Cao, Da
    Han, Ning
    Chen, Hao
    Wei, Xiaochi
    He, Xiangnan
    [J]. INFORMATION SCIENCES, 2020, 514 : 302 - 318
  • [7] Video-Based Cross-Modal Recipe Retrieval
    Cao, Da
    Yu, Zhiwang
    Zhang, Hanling
    Fang, Jiansheng
    Nie, Liqiang
    Tian, Qi
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1685 - 1693
  • [8] Semi-supervised multi-graph hashing for scalable similarity search
    Cheng, Jian
    Leng, Cong
    Li, Peng
    Wang, Meng
    Lu, Hanging
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 124 : 12 - 21
  • [9] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
    Costa Pereira, Jose
    Coviello, Emanuele
    Doyle, Gabriel
    Rasiwasia, Nikhil
    Lanckriet, Gert R. G.
    Levy, Roger
    Vasconcelos, Nuno
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
  • [10] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
    Deng, Cheng
    Chen, Zhaojia
    Liu, Xianglong
    Gao, Xinbo
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3893 - 3903