Online Asymmetric Similarity Learning for Cross-Modal Retrieval

被引:34
作者
Wu, Yiling [1 ,2 ]
Wang, Shuhui [1 ]
Huang, Qingming [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Intell Info Proc, Inst Comput Tech, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR.2017.424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval has attracted intensive attention in recent years. Measuring the semantic similarity between heterogeneous data objects is an essential yet challenging problem in cross-modal retrieval. In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled as a set of bi-directional hinge loss constraints on the cross-modal training triplets. The overall online similarity function learning problem is optimized by the margin based Passive-Aggressive algorithm. We further extend the approach to learn similarity function in reproducing kernel Hilbert spaces by kernelizing the approach and combining multiple kernels derived from different layers of the CNN features using the Hedging algorithm. Theoretical mistake bounds are given for our methods. Experiments conducted on real world datasets well demonstrate the effectiveness of our methods.
引用
收藏
页码:3984 / 3993
页数:10
相关论文
共 41 条
[1]  
Andrew G., 2013, P INT C MACH LEARN, P1247
[2]   Learning to rank with (a lot of) word features [J].
Bai, Bing ;
Weston, Jason ;
Grangier, David ;
Collobert, Ronan ;
Sadamasa, Kunihiko ;
Qi, Yanjun ;
Chapelle, Olivier ;
Weinberger, Kilian .
INFORMATION RETRIEVAL, 2010, 13 (03) :291-314
[3]  
Chechik G, 2010, J MACH LEARN RES, V11, P1109
[4]  
Chua T.-S., 2009, P ACM INT C IM VID R, P1
[5]  
Crammer K, 2006, J MACH LEARN RES, V7, P551
[6]  
Davis Jason V, 2007, P 24 INT C MACHINE L, P209, DOI DOI 10.1145/1273496.1273523
[7]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[8]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[9]  
Globerson A., 2005, Advances in neural information processing systems, P451, DOI DOI 10.5555/2976248.2976305
[10]   A discriminative kernel-based model to rank images from text queries [J].
Grangier, David ;
Bengio, Samy .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) :1371-1384