Attention Alignment Multimodal LSTM for Fine-Gained Common Space Learning

被引:16
作者
Chen, Sijia [1 ]
Song, Bin [1 ]
Guo, Jie [1 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal data fusion; phrase localization; fine-grained common space; attention alignment; hierarchical multimodal LSTM;
D O I
10.1109/ACCESS.2018.2822663
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem common space learning approach that maps all related multimodal information into a common space for multimodal data. To establish a fine-grained common space, the aligned relevant local information of different modalities is used to learn a common subspace where the projected fragmented information is further integrated according to intra-modal semantic relationships. Specifically, we propose a novel multimodal LSTM with an attention alignment mechanism, namely attention alignment multimodal LSTM (AAM-LSTM), which mainly includes attentional alignment recurrent network (AA-R) and hierarchical multimodal LSTM (HM-LSTM). Different from the traditional methods which operate on the full modal data directly, the proposed model exploits the inter-modal and intra-modal semantic relationships of local information, to jointly establish a uniform representation of multi-modal data. Specifically, AA-R automatically captures semantic-aligned local information to learn common subspace without the need of supervised labels, then HM-LSTM leverages the potential relationships of these local information to learn a fine-grained common space. The experimental results on Filker30K, Filker8K,and Filker30K entities verify the performance and effectiveness of our model, which compares favorably with the state-of-the-art methods. In particular, the experiment of phrase localization on AA-R with Filker30K entities shows the expected accurate attention alignment. Moreover, from the experiment results of image-sentence retrieval tasks, it can be concluded that the proposed AAM-LSTM outperforms benchmark algorithms by a large margin.
引用
收藏
页码:20195 / 20208
页数:14
相关论文
共 44 条
[1]  
Andrienko G., 2013, Introduction, P1
[2]  
[Anonymous], P INT C LEARN REPR I
[3]  
[Anonymous], 2014, EXPLAIN IMAGES MULTI
[4]  
[Anonymous], 2014, P ADV NEUR INF PROC
[5]  
[Anonymous], 2014, T ASSOC COMPUT LING
[6]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[7]  
[Anonymous], IEEE T CIRC IN PRESS
[8]  
[Anonymous], 2015, THESIS
[9]  
[Anonymous], 2014, DEEP CAPTIONING MULT
[10]  
[Anonymous], PROC CVPR IEEE