Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images

被引:14
作者
Balaneshin-kordan, Saeid [1 ]
Kotov, Alexander [1 ]
机构
[1] Wayne State Univ, Detroit, MI 48202 USA
来源
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2018年
关键词
Multi-Modal IR; Cross-Modal IR; Deep Neural Networks;
D O I
10.1145/3159652.3159735
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in deep learning and distributed representations of images and text have resulted in the emergence of several neural architectures for cross-modal retrieval tasks, such as searching collections of images in response to textual queries and assigning textual descriptions to images. However, the multi-modal retrieval scenario, when a query can be either a text or an image and the goal is to retrieve both a textual fragment and an image, which should be considered as an atomic unit, has been significantly less studied. In this paper, we propose a gated neural architecture to project image and keyword queries as well as multi-modal retrieval units into the same low-dimensional embedding space and perform semantic matching in this space. The proposed architecture is trained to minimize structured hinge loss and can be applied to both cross- and multi-modal retrieval. Experimental results for six different cross-and multi-modal retrieval tasks obtained on publicly available datasets indicate superior retrieval accuracy of the proposed architecture in comparison to the state-of-art baselines.
引用
收藏
页码:28 / 36
页数:9
相关论文
共 60 条
[1]   Optimal Data-Dependent Hashing for Approximate Near Neighbors [J].
Andoni, Alexandr ;
Razenshteyn, Ilya .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :793-801
[2]  
[Anonymous], 2017, 31 AAAI C ART INT AA
[3]  
[Anonymous], 2016, ARXIV160507891
[4]  
[Anonymous], PROC CVPR IEEE
[5]  
[Anonymous], 2013, NeurIPS
[6]  
[Anonymous], 1997, Neural Computation
[7]  
[Anonymous], ARXIV160202255
[8]  
[Anonymous], 2014, P 23 ACM INT C C INF, DOI DOI 10.1145/2661829.2661935
[9]  
[Anonymous], IEEE T PATTERN ANAL
[10]  
[Anonymous], ARXIV160804307