Discrete Semantic Alignment Hashing for Cross-Media Retrieval

被引:27
作者
Yao, Tao [1 ,2 ]
Kong, Xiangwei [3 ]
Fu, Haiyan [4 ]
Tian, Qi [5 ]
机构
[1] Ludong Univ, Dept Informat & Elect Engn, Yantai 264000, Peoples R China
[2] Southwest Jiaotong Univ, Yantai Res Inst New Generat Informat Technol, Yantai 264004, Peoples R China
[3] Zhejiang Univ, Dept Data Sci & Engn Management, Hangzhou 310058, Peoples R China
[4] Dalian Univ Technol, Dept Informat & Commun Engn, Dalian 116024, Peoples R China
[5] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
基金
中国国家自然科学基金;
关键词
Semantics; Hash functions; Correlation; Quantization (signal); Optimization; Task analysis; Internet; Attribute; collective filtering; cross-media retrieval; hashing; OBJECTS;
D O I
10.1109/TCYB.2019.2912644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-media hashing, which maps data from different modalities to a low-dimensional sharing Hamming space, has attracted considerable attention due to the rapid increase of multimodal data, for example, images and texts. Recent cross-media hashing works mainly aim at learning compact hash codes to preserve the class label-based or feature-based similarities among samples. However, these methods ignore the unbalanced semantic gaps between different modalities and high-level semantic concepts, which generally results in less effective hash functions and unsatisfying retrieval performance. Specifically, the key words of texts contain semantic meanings, while the low-level features of images lack of semantic meanings. That means the semantic gap in image modality is larger than that in text modality. In this paper, we propose a simple yet effective hashing method for cross-media retrieval to address this problem, dubbed discrete semantic alignment hashing (DSAH). First, DSAH formulates to exploit collaborative filtering to mine the relations between class labels and hash codes, which can reduce memory consumption and computational cost compared to pairwise similarity. Then, the attribute of image modality is employed to align the semantic information with text modality. Finally, to further improve the quality of hash codes, we propose a discrete optimization algorithm to learn discrete hash codes directly, and each bit has a closed-form solution. Extensive experiments on multiple public databases show that our model can seamlessly incorporate attributes and achieve promising performance.
引用
收藏
页码:4896 / 4907
页数:12
相关论文
共 63 条
[1]  
[Anonymous], 2010, P 18 ACM INT C MULT, DOI 10.1145/1873951.1873987
[2]  
[Anonymous], 2012, P 18 ACM SIGKDD INT, DOI DOI 10.1145/2339530.2339678
[3]  
[Anonymous], 2015, IEEE T IMAGE PROCESS, DOI DOI 10.1109/TIP.2015.2467315
[4]  
Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[5]   Augmented Image Retrieval using Multi-Order Object Layout with Attributes [J].
Cao, Xiaochun ;
Wei, Xingxing ;
Guo, Xiaojie ;
Han, Yahong ;
Tang, Jinhui .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :1093-1096
[6]  
Chua T.-S., 2009, P ACM INT C IM VID R
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   Collective Matrix Factorization Hashing for Multimodal Data [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090
[9]  
Douze M, 2011, PROC CVPR IEEE, P745, DOI 10.1109/CVPR.2011.5995595
[10]  
Farhadi A, 2009, PROC CVPR IEEE, P1778, DOI 10.1109/CVPRW.2009.5206772