Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval

被引:4
作者
Yuan, Yuxin [1 ]
Peng, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing, Peoples R China
来源
MULTIMEDIA MODELING, MMM 2018, PT I | 2018年 / 10704卷
基金
中国国家自然科学基金;
关键词
Cross-media retrieval; Recursive pyramid network; Joint attention; REPRESENTATION;
D O I
10.1007/978-3-319-73603-7_33
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-media retrieval has raised wide attention in recent years, for its flexibility in retrieving results across different media types by a query of any media type. Besides studying on the global information of the samples, some recent works focus on the regions of the samples to mine local information for better correlation learning of different media types. However, these works focus on the correlations of regions and sample, while ignoring the correlations between regions, including the significance of each region among all of them, and the supplementary information between the region and its sub-regions, similar to the sample and its regions. For addressing this problem, this paper proposes a new recursive pyramid network with joint attention (RPJA) for cross-media retrieval, which has two main contributions: (1) We repeatedly partition the sample into increasingly fine regions in a pyramid structure, and the representation of sample is generated by modeling the supplementary information, which is provided by the regions and their sub-regions recursively from the bottom to top of pyramid. (2) We propose a joint attention model connecting different media types in each pyramid level, which mines the intra-media information and inter-media correlations to guide the learning of significance of each region, further improving the performance of correlation learning. Experiments on two widely-used datasets compared with state-of-the-art methods verify the effectiveness of our proposed approach.
引用
收藏
页码:405 / 416
页数:12
相关论文
共 25 条
[1]  
[Anonymous], 2003, P ACM INT C MULT ACM
[2]  
[Anonymous], 2016, INT JOINT C ART INT
[3]  
[Anonymous], 2010, P NAACL HLT 2010 WOR
[4]  
[Anonymous], 2010, P 18 ACM INT C MULT
[5]  
[Anonymous], 2016, ARXIV161201452
[6]  
[Anonymous], 2017, ARXIV170402116
[7]  
[Anonymous], 2015, COMPUTER SCI
[8]  
[Anonymous], 2015, ICLR 2015
[9]  
[Anonymous], 2014, Neural Information Processing Systems
[10]   Cross-modal Retrieval with Correspondence Autoencoder [J].
Feng, Fangxiang ;
Wang, Xiaojie ;
Li, Ruifan .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :7-16