Cross-modal Retrieval with Correspondence Autoencoder

被引:429
作者
Feng, Fangxiang [1 ]
Wang, Xiaojie [1 ]
Li, Ruifan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14) | 2014年
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
Cross-modal; retrieval; image and text; deep learning; autoencoder;
D O I
10.1145/2647868.2654902
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter alpha is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.
引用
收藏
页码:7 / 16
页数:10
相关论文
共 30 条
[1]  
[Anonymous], 2010, P 18 ACM INT C MULT
[2]  
[Anonymous], 2013, Neural Information Processing Systems, DOI DOI 10.5555/2999792.2999849
[3]  
[Anonymous], 2010, P 18 ACM INT C MULT
[4]  
[Anonymous], 2003, P 26 ANN INT ACM SIG
[5]  
[Anonymous], AAAI
[6]  
[Anonymous], 2008, JMLR
[7]  
[Anonymous], ICML REPR LEARN WORK
[8]  
[Anonymous], ADV NEURAL INFORM PR
[9]  
[Anonymous], 2009, NATURAL LANGUAGE PRO
[10]  
[Anonymous], 2009, CIVR