HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

被引：43

作者：

Zhang, Chengyuan ^{[1
]}

Song, Jiayu ^{[2
]}

Zhu, Xiaofeng ^{[3
]}

Zhu, Lei ^{[4
]}

Zhang, Shichao ^{[2
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China

[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Sichuan, Peoples R China

[4] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Hunan, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; deep learning; intra-modal semantic correlation; hybrid cross-modal similarity;

D O I：

10.1145/3412847

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.

引用

页数：22

共 68 条

[1]

Andrienko G., 2013, Introduction, P1

[2]

[Anonymous], 2014, PROC C EMPIRICAL MET, DOI DOI 10.3115/V1/D14-1181

[3]

[Anonymous], 2019, P 2019 INT C MULT RE, DOI DOI 10.1145/3323873.3325019

[4]

[Anonymous], 2011, P 28 INT C MACH LEAR

[5]

[Anonymous], 2011, Advances in Neural Information Processing Systems

[6]

[Anonymous], 2003, PROC 1 INSTRUCTIONAL

[7]

[Anonymous], 2009, CIVR

[8]

Blei DM, 2003, P 26 ANN INT ACM SIG, P127, DOI DOI 10.1145/860435.860460

[9] Cross-modal recipe retrieval via parallel- and cross-attention networks learning [J].

Cao, Da ;

Chu, Jingjing ;

Zhu, Ningbo ;

Nie, Liqiang .

KNOWLEDGE-BASED SYSTEMS, 2020, 193

[10] Video-based recipe retrieval [J].

Cao, Da ;

Han, Ning ;

Chen, Hao ;

Wei, Xiaochi ;

He, Xiangnan .

INFORMATION SCIENCES, 2020, 514 :302-318

← 1 2 3 4 5 6 7 →